Jerry Chen <je...@3rdengine.com> added the comment:

The attached patch includes Neil's original additions to test_xml_etree.py.


I also noticed that _encode_entity wasn't being called in ElementTree in
py3k, with the important bit being the nested function
escape_entities(), in conjunction with _escape and _escape_map.

In 2.x, _encode_entity() is used after _encode() throws Unicode
exceptions [1], so I figured it would make sense to take the core
functionality of _escape_entities() and integrate it into _encode in the
same fashion -- when an exception is thrown.

Basically, I:
- changed _escape regexp from using "[\x0080-\uffff]" to "[\x80-xff]"
- extracted _encode_entity.escape_entities() and made it
_escape_entities of module scope
- removed _encode_entity()
- added UnicodeEncodeError exception in _encode()

I'm not sure what the expected outcome is supposed to be when the text
is not type bytes but str. With this patch, the output has
b"t&#195;&#163;t" rather than b"t&#227;t".

Hope this is a step in the right direction.

[1] ElementTree.py:814, ElementTree.py:829, python 2.7 HEAD r50941

----------
nosy: +jcsalterego
Added file: http://bugs.python.org/file14340/issue6233-escape_entities.diff

_______________________________________
Python tracker <rep...@bugs.python.org>
<http://bugs.python.org/issue6233>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

Reply via email to