Stefan Scholl wrote: > Stefan Behnel <[EMAIL PROTECTED]> wrote: >> Stefan Scholl wrote: >>> Stefan Behnel <[EMAIL PROTECTED]> wrote: >>>> Stefan Scholl wrote: >>>>> Stefan Behnel <[EMAIL PROTECTED]> wrote: >>>>>> Stefan Scholl wrote: >>>>>>> Well, http://docs.python.org/lib/module-xml.sax.html is missing >>>>>>> the fact, that I can't use Unicode with parseString(). >>>>>>> >>>>>>> This parseString() uses cStringIO. >>>>>> Well, Python unicode is not a valid *byte* encoding for XML. >>>>>> >>>>>> lxml.etree can parse unicode, if you really want, but otherwise, you >>>>>> should >>>>>> maybe stick to well-formed XML. >>>>> The XML is well-formed. Works perfect in Python 2.4 with Python >>>>> unicode and Python sax parser. >>>> The XML is *not* well-formed if you pass Python unicode instead of a byte >>>> encoded string. Read the XML spec. >>>> >>>> It would be well-formed if you added the proper XML declaration, but that >>>> is >>>> system specific (UCS-4 or UTF-16, BE or LE). So don't even try. >>> Who cares? I'm not calling any external tools. >> XML cares. If you want to work with something that is not XML, do not expect >> XML tools to help you do it. XML tools work with XML, and there is a spec >> that >> says what XML is. Your string is not XML. > > This isn't some sophisticated XML tool that tells me the string > is wrong. It's a changed behavior of cStringIO that throws an > exception. While I'm just using the method parseString() of > xml.sax.
All I'm saying is that parseString() is perfectly right in using cStringIO, as cStringIO supports every possible incarnation of serialised XML. It was documented that cStringIO does not support Unicode and it doesn't: $ python2.4 Python 2.4.4 (#2, Apr 12 2007, 21:03:11) [GCC 4.1.2 (Ubuntu 4.1.2-0ubuntu4)] on linux2 Type "help", "copyright", "credits" or "license" for more information. >>> from cStringIO import StringIO >>> s = StringIO() >>> s.write(u"\uf852") Traceback (most recent call last): File "<stdin>", line 1, in ? UnicodeEncodeError: 'ascii' codec can't encode character u'\uf852' in position 0: ordinal not in range(128) What a surprise. Stefan -- http://mail.python.org/mailman/listinfo/python-list