Henry

Thanks for the explanations. It's a bit clearer now. I'm still not sure
about how ZnUrl>>retrieveContents manages to decode correctly in this case;
I'm sure I recall Sven saying it didn't (and in his view shouldn't) look at
the HTTP declarations in the header. There is also the mystery of how the
string reader in the XML-Parser package (XMLURI>>get) does the same trick,
when it is presumably what XMLHTMLParser>>parseURL: uses and fails.

However, all these are second order problems. It all begins because the
Corriere web site does strange things with encoding, including using a UTF8
character in a page coded with 8859-1, as Paul pointed out. In any case,
reading the page as a string and then parsing it solves my problem, so I
shall stick to that as a standard procedure. Most importantly, I don't think
there is any indication of a problem in the XML package for Monty to worry
about.

Thanks again

Peter



--
Sent from: http://forum.world.st/Pharo-Smalltalk-Users-f1310670.html

Reply via email to