"Diez B. Roggisch" <de...@nospam.web.de> wrote in message
news:7jub5rf37div...@mid.uni-berlin.de...
[snip]
This is wierd. I looked at the site in FireFox - and it was displayed
correctly, including umlauts. Bringing up the info-dialog claims the page
is UTF-8, the XML itself says so as well (implicit, through the missing
declaration of an encoding) - but it clearly is *not* utf-8.
One would expect google to be better at this...
Diez
According to the XML 1.0 specification:
"Although an XML processor is required to read only entities in the UTF-8
and UTF-16 encodings, it is recognized that other encodings are used around
the world, and it may be desired for XML processors to read entities that
use them. In the absence of external character encoding information (such as
MIME headers), parsed entities which are stored in an encoding other than
UTF-8 or UTF-16 must begin with a text declaration..."
So UTF-8 and UTF-16 are the defaults supported without an xml declaration in
the absence of external encoding information. But we have external
character encoding information:
f = urllib.urlopen("http://www.google.de/ig/api?weather=Muenchen")
f.headers.dict['content-type']
'text/xml; charset=ISO-8859-1'
So the page seems correct.
-Mark
--
http://mail.python.org/mailman/listinfo/python-list