Re: umlauts

Mark Tolonen Sat, 17 Oct 2009 13:20:44 -0700

"Diez B. Roggisch" <de...@nospam.web.de> wrote in messagenews:7jub5rf37div...@mid.uni-berlin.de...

[snip]

This is wierd. I looked at the site in FireFox - and it was displayedcorrectly, including umlauts. Bringing up the info-dialog claims the pageis UTF-8, the XML itself says so as well (implicit, through the missingdeclaration of an encoding) - but it clearly is *not* utf-8.
One would expect google to be better at this...

Diez


According to the XML 1.0 specification:

"Although an XML processor is required to read only entities in the UTF-8and UTF-16 encodings, it is recognized that other encodings are used aroundthe world, and it may be desired for XML processors to read entities thatuse them. In the absence of external character encoding information (such asMIME headers), parsed entities which are stored in an encoding other thanUTF-8 or UTF-16 must begin with a text declaration..."

So UTF-8 and UTF-16 are the defaults supported without an xml declaration inthe absence of external encoding information. But we have externalcharacter encoding information:

f = urllib.urlopen("http://www.google.de/ig/api?weather=Muenchen";)
f.headers.dict['content-type']

'text/xml; charset=ISO-8859-1'

So the page seems correct.

-Mark


--
http://mail.python.org/mailman/listinfo/python-list

Re: umlauts

Reply via email to