XML documents can contain non ASCII characters, like Norwegian æ ø å , or French ê è é.
To avoid errors, specify the XML encoding, or save XML files as Unicode.
XML Encoding Errors
If you load an XML document, you can get two different errors indicating encoding problems:
An invalid character was found in text content.
You get this error if your XML contains non ASCII characters, and the file was saved as single-byte ANSI (or ASCII) with no encoding specified.
Windows Notepad
Windows Notepad save files as single-byte ANSI (ASCII) by default.
If you select "Save as...", you can specify double-byte Unicode (UTF-16).
Save the XML file below as Unicode (note that the document does not contain any encoding attribute):
<?xml version="1.0"?>
<note>
<from>Umair</from>
<to>Shah</to>
<message>Norwegian: æøå. French: êèé</message>
</note>
<note>
<from>Umair</from>
<to>Shah</to>
<message>Norwegian: æøå. French: êèé</message>
</note>
The file above, note_encode_none_u.xml will NOT generate an error. But if you specify a single-byte encoding it will.
The following encoding (open it), will give an error message:
Example:
<?xml version="1.0" encoding="windows-1252"?>
The following encoding (open it), will give an error message:
Example:
<?xml version="1.0" encoding="ISO-8859-1"?>
The following encoding (open it), will give an error message:
Example:
<?xml version="1.0" encoding="UTF-8"?>
The following encoding (open it), will NOT give an error:
Example:
<?xml version="1.0" encoding="UTF-16"?>
Conclusion
- Always use the encoding attribute
- Use an editor that supports encoding
- Make sure you know what encoding the editor uses
- Use the same encoding in your encoding attribute
0 comments:
Post a Comment