On Fri, 27 Jul 2007 06:47:48 +0200, Stefan Scholl wrote: > Chris Mellon <[EMAIL PROTECTED]> wrote: >> XML is not a string. It's a specific type of bytestream. If you want >> to work with XML, then generate well-formed XML in the correct >> encoding. There's no reason you should have an XML document (as >> opposed to values extracted from that document) in unicode objects at >> all. > > The affected method in xml.sax is called parseString()
Exactly. It's *not* called `parseUnicode()`. In Python you have the types `str` which is a bunch of bytes and `unicode` which is a character string that can hold unicode characters. How those are internally represented is an implementation detail. You shouldn't know or depend on the internal representation, other modules like XML parsers shouldn't do either. XML, the serialized form, is about bytes in some encoding. So this can only be stored in `str` objects. `unicode` is already decoded. If you want to feed `unicode` objects to an XML parser, simply encode it before passing it. The question remains why you have "serialized XML" as `unicode` in the first place as it is about bytes not unicode characters. Ciao, Marc 'BlackJack' Rintsch -- http://mail.python.org/mailman/listinfo/python-list