On Feb 2, 12:56 am, Jeroen Ruigrok van der Werven <[EMAIL PROTECTED]
nomine.org> wrote:
> -On [20080201 19:06], JKPeck ([EMAIL PROTECTED]) wrote:
>
> >In both of these cases, there are only plain, 7-bit ascii characters
> >in the xml, and it really is valid utf-16 as far as I can tell.
>
> Did you mean to say that the only characters they used in the UTF-16 encoded
> file are characters from the Basic Latin Unicode block?
>
> --
> Jeroen Ruigrok van der Werven <asmodai(-at-)in-nomine.org> / asmodai
> イェルーン ラウフロック ヴァン デル ウェルヴェンhttp://www.in-nomine.org/|http://www.rangaku.org/
> We have met the enemy and they are ours...

It appears that the root cause of this problem is indeed passing a
Unicode XML string to xml.sax.parseString with an encoding declaration
in the XML of utf-16.  This works with the standard distribution on
Windows.  It does not work with ActiveState on Windows even though
both distributions report
64K for sys.maxunicode.

So I don't know why the results are different, but the problem is
solved by encoding the Unicode string into utf-16 before passing it to
the parser.

Thanks to all for helping to track this down.

Regards,
Jon Peck
-- 
http://mail.python.org/mailman/listinfo/python-list

Reply via email to