On 7/26/07, Stefan Scholl <[EMAIL PROTECTED]> wrote: > Stefan Behnel <[EMAIL PROTECTED]> wrote: > > The XML is *not* well-formed if you pass Python unicode instead of a byte > > encoded string. Read the XML spec. > > Pointers, please. > > Last time I read that part of the spec was when a customer's > consulting company switched to ISO-8859-15 without saying > something beforehand. The old code (PHP) I have to maintain > couldn't deal with it. > > It was wrong to switch encoding without telling somebody about > it. And a XML processor isn't required to support ISO-8859-15. > But I thought it was too embarrassing not to support this > encoding. I fixed that part without making a fuss. > > > A Python XML processor that can't handle the own encoding is > embarrassing. It isn't required to support it. It would be OK if > it wouldn't support UTF-7. But a parseString() method that > doesn't want Python strings? No way! >
Of course it can handle its own encoding. But you're passing incorrect values to it, the same way that passing '10' to a function expecting an int is going to fail. cStringIO in python 2.4 is buggy - when passed a unicode object, it silently uses the (platform and compilation dependent) internal buffer of the unicode object. In 2.5 this was corrected to be consistent with all other unicode/str conversions and encode it using the default encoding, failing when that's not possible (as in your example). It's not that your code worked on 2.4, and 2.5 broke it - the 2.4 code was subtly buggy and 2.5 is preventing you from having that bug. XML is not a string. It's a specific type of bytestream. If you want to work with XML, then generate well-formed XML in the correct encoding. There's no reason you should have an XML document (as opposed to values extracted from that document) in unicode objects at all. -- http://mail.python.org/mailman/listinfo/python-list