Ezio Melotti <[email protected]> added the comment:

> I consider this an important missing backport for 2.7, since
> without this handler, the UTF-8 codecs in 2.7 and 3.x are
> incompatible and there's no other way to work around this
> other than to make use of the errorhandler conditionally
> depend on the Python version.

FWIW I tried to updated the UTF-8 codec on trunk from RFC 2279 to RFC 3629 
while working on #8271, and found out this difference in the handling of 
surrogates (only on 3.x they are invalid).
I didn't change the behavior of the codec in the patch I attached to #8271 
because it was out of the scope of the issue, but I consider the fact that in 
Python 2.x surrogates can be encoded as a bug, because it doesn't follow RFC 
3629.
IMHO Python 2.x should provide an RFC-3629-compliant UTF-8 codec, however I 
didn't have time yet to investigate how Python 3 handles this and what is the 
best solution (e.g. adding another codec or change the default behavior).

----------

_______________________________________
Python tracker <[email protected]>
<http://bugs.python.org/issue8438>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

Reply via email to