"Diez B. Roggisch" <de...@nospam.web.de> wrote in message news:7jub5rf37div...@mid.uni-berlin.de...
[snip]
This is wierd. I looked at the site in FireFox - and it was displayed correctly, including umlauts. Bringing up the info-dialog claims the page is UTF-8, the XML itself says so as well (implicit, through the missing declaration of an encoding) - but it clearly is *not* utf-8.

One would expect google to be better at this...

Diez

According to the XML 1.0 specification:

"Although an XML processor is required to read only entities in the UTF-8 and UTF-16 encodings, it is recognized that other encodings are used around the world, and it may be desired for XML processors to read entities that use them. In the absence of external character encoding information (such as MIME headers), parsed entities which are stored in an encoding other than UTF-8 or UTF-16 must begin with a text declaration..."

So UTF-8 and UTF-16 are the defaults supported without an xml declaration in the absence of external encoding information. But we have external character encoding information:

f = urllib.urlopen("http://www.google.de/ig/api?weather=Muenchen";)
f.headers.dict['content-type']
'text/xml; charset=ISO-8859-1'

So the page seems correct.

-Mark


--
http://mail.python.org/mailman/listinfo/python-list

Reply via email to