XML expects a prolog in the document itself defining the encoding, if absent,
the standard specifies utf-8.
So when you use an XML parser to parse an HTML page, it will disregard any
HTTP encodings, interpret the contents as an XML document with missing
prolog, and try to parse as utf8.

When you use ZnUrl getContents however, it respects the HTTP charset header
field, which correctly identifies the contents as 8859-1, and lets you
correctly read it into an internal string.
Subsequently parsing said internal string, the XML parser won't try to do
any conversion, and therefore works.

Cheers,
Henry



--
Sent from: http://forum.world.st/Pharo-Smalltalk-Users-f1310670.html

Reply via email to