On Feb 2, 12:56 am, Jeroen Ruigrok van der Werven <[EMAIL PROTECTED] nomine.org> wrote: > -On [20080201 19:06], JKPeck ([EMAIL PROTECTED]) wrote: > > >In both of these cases, there are only plain, 7-bit ascii characters > >in the xml, and it really is valid utf-16 as far as I can tell. > > Did you mean to say that the only characters they used in the UTF-16 encoded > file are characters from the Basic Latin Unicode block? > > -- > Jeroen Ruigrok van der Werven <asmodai(-at-)in-nomine.org> / asmodai > イェルーン ラウフロック ヴァン デル ウェルヴェンhttp://www.in-nomine.org/|http://www.rangaku.org/ > We have met the enemy and they are ours...
It appears that the root cause of this problem is indeed passing a Unicode XML string to xml.sax.parseString with an encoding declaration in the XML of utf-16. This works with the standard distribution on Windows. It does not work with ActiveState on Windows even though both distributions report 64K for sys.maxunicode. So I don't know why the results are different, but the problem is solved by encoding the Unicode string into utf-16 before passing it to the parser. Thanks to all for helping to track this down. Regards, Jon Peck
-- http://mail.python.org/mailman/listinfo/python-list