Bugs item #1470540, was opened at 2006-04-15 00:07 Message generated for change (Comment added) made by ngrig You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=105470&aid=1470540&group_id=5470
Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: XML Group: Python 2.5 Status: Open Resolution: None Priority: 5 Submitted By: Nikolai Grigoriev (ngrig) Assigned to: Nobody/Anonymous (nobody) Summary: XMLGenerator creates a mess with UTF-16 Initial Comment: When output encoding in xml.sax.saxutils.XMLGenerator is set to UTF-16, the result is a terrible mess. Namely: - it does not encode the XML declaration at the very top of the file (leaving it in single-byte Latin); - it leaves closing '>' of each start tag unencoded (that is, always outputs a single byte); - it inserts a spurious byte order mark for each tag, each attribute, each text node, and each processing instruction. A test illustrating the issue is attached. The issue is applicable to both stable (2.4.3) and current (2.5) versions of Python. --------------------------------------------- Looking in xml/sax/saxutils.py, I see the problem in XMLGenerator._write(): - one-byte strings aren't recoded at all (sic!); - two-byte strings are converted using unicode.encode(); this results in a BOM for each call of _write() on Unicode strings. The issue is easy to fix by using StreamWriter instead of a plain stream as the output sink. I am going to submit a patch shortly. Regards, Nikolai Grigoriev ---------------------------------------------------------------------- >Comment By: Nikolai Grigoriev (ngrig) Date: 2006-04-16 11:42 Message: Logged In: YES user_id=195108 FYI: I posted a patch (#1470548) that fixes the issue. Regards, Nikolai Grigoriev ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=105470&aid=1470540&group_id=5470 _______________________________________________ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com