On Feb 5, 9:02 am, JKPeck <[EMAIL PROTECTED]> wrote: > On Feb 2, 12:56 am, Jeroen Ruigrok van der Werven <[EMAIL PROTECTED] > > nomine.org> wrote: > > -On [20080201 19:06], JKPeck ([EMAIL PROTECTED]) wrote: > > > >In both of these cases, there are only plain, 7-bit ascii characters > > >in the xml, and it really is valid utf-16 as far as I can tell. > > > Did you mean to say that the only characters they used in the UTF-16 encoded > > file are characters from the Basic Latin Unicode block? > > > It appears that the root cause of this problem is indeed passing a > Unicode XML string to xml.sax.parseString with an encoding declaration > in the XML of utf-16. This works with the standard distribution on > Windows.
It did NOT work for me with the standard 2.5.1 Windows distribution -- see the code + output that I posted. > It does not work with ActiveState on Windows even though > both distributions report > 64K for sys.maxunicode. > > So I don't know why the results are different, but the problem is > solved by encoding the Unicode string into utf-16 before passing it to > the parser.
-- http://mail.python.org/mailman/listinfo/python-list