[issue1767933] Badly formed XML using etree and utf-16

Richard urwin Fri, 14 Nov 2008 07:33:58 -0800

Richard urwin <[EMAIL PROTECTED]> added the comment:

This is a bug in two halves.


1. Not all characters in the file are UTF-16. The initial xml header
isn't, and the individual < > etc characters are not. This is just a
matter of extending the methodology to encode all characters and not
just the textual bits. There is no work-around except a five-minute hack
of the ElementTree.write() method.

2. Every write has a BOM, so corrupting the file in a manner analogous
to bug 555360. This is a result of using string.encode() and is a
well-known feature. It can be worked around by using UTF-16LE or
UTF-16BE which do not prepend a BOM, but then the file doesn't have any
BOM. A complete solution would be to rewrite ElementTree.write() to use
a different encoding methodology such as StreamWriter.

I have made the above hack and work-around for my own use, and I can
report that it produces perfect UTF-16.

----------
nosy: +rurwin
versions: +Python 2.6

_______________________________________
Python tracker <[EMAIL PROTECTED]>
<http://bugs.python.org/issue1767933>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue1767933] Badly formed XML using etree and utf-16

Reply via email to