Ezio Melotti <[email protected]> added the comment: > I consider this an important missing backport for 2.7, since > without this handler, the UTF-8 codecs in 2.7 and 3.x are > incompatible and there's no other way to work around this > other than to make use of the errorhandler conditionally > depend on the Python version.
FWIW I tried to updated the UTF-8 codec on trunk from RFC 2279 to RFC 3629 while working on #8271, and found out this difference in the handling of surrogates (only on 3.x they are invalid). I didn't change the behavior of the codec in the patch I attached to #8271 because it was out of the scope of the issue, but I consider the fact that in Python 2.x surrogates can be encoded as a bug, because it doesn't follow RFC 3629. IMHO Python 2.x should provide an RFC-3629-compliant UTF-8 codec, however I didn't have time yet to investigate how Python 3 handles this and what is the best solution (e.g. adding another codec or change the default behavior). ---------- _______________________________________ Python tracker <[email protected]> <http://bugs.python.org/issue8438> _______________________________________ _______________________________________________ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
