Stefan Scholl wrote: > Stefan Behnel <[EMAIL PROTECTED]> wrote: >> The XML is *not* well-formed if you pass Python unicode instead of a byte >> encoded string. Read the XML spec. > > Pointers, please.
There you have it: http://www.w3.org/TR/xml/#charencoding """ In the absence of information provided by an external transport protocol (e.g. HTTP or MIME), it is a *fatal error* for an entity including an encoding declaration to be presented to the XML processor in an encoding other than that named in the declaration, or for an entity which begins with neither a Byte Order Mark nor an encoding declaration to use an encoding other than UTF-8. """ """ Entities encoded in UTF-16 MUST and entities encoded in UTF-8 MAY begin with the Byte Order Mark ... """ Python does not use BOMs internally (although that again may be platform specific). You might argue that there is some kind of "external transportation protocol" as it is a Python Unicode string (I used that excuse when I implemented Unicode parsing support in lxml), but Python's Unicode objects are strictly a character stream, not a byte stream. XML is only defined for streams of bytes. Also, there is no requirement for an XML processor to be able to parse anything but UTF-8 and UTF-16. Especially if the encoding is *undefined* and *platform-specific*, as that of a Python Unicode string. Anything else I can help you understanding? Stefan -- http://mail.python.org/mailman/listinfo/python-list