Jerry Chen <je...@3rdengine.com> added the comment: The attached patch includes Neil's original additions to test_xml_etree.py.
I also noticed that _encode_entity wasn't being called in ElementTree in py3k, with the important bit being the nested function escape_entities(), in conjunction with _escape and _escape_map. In 2.x, _encode_entity() is used after _encode() throws Unicode exceptions [1], so I figured it would make sense to take the core functionality of _escape_entities() and integrate it into _encode in the same fashion -- when an exception is thrown. Basically, I: - changed _escape regexp from using "[\x0080-\uffff]" to "[\x80-xff]" - extracted _encode_entity.escape_entities() and made it _escape_entities of module scope - removed _encode_entity() - added UnicodeEncodeError exception in _encode() I'm not sure what the expected outcome is supposed to be when the text is not type bytes but str. With this patch, the output has b"tãt" rather than b"tãt". Hope this is a step in the right direction. [1] ElementTree.py:814, ElementTree.py:829, python 2.7 HEAD r50941 ---------- nosy: +jcsalterego Added file: http://bugs.python.org/file14340/issue6233-escape_entities.diff _______________________________________ Python tracker <rep...@bugs.python.org> <http://bugs.python.org/issue6233> _______________________________________ _______________________________________________ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com