Marc-Andre Lemburg <[email protected]> added the comment: Ezio Melotti wrote: > > Ezio Melotti <[email protected]> added the comment: > >> I consider this an important missing backport for 2.7, since >> without this handler, the UTF-8 codecs in 2.7 and 3.x are >> incompatible and there's no other way to work around this >> other than to make use of the errorhandler conditionally >> depend on the Python version. > > FWIW I tried to updated the UTF-8 codec on trunk from RFC 2279 to RFC 3629 > while working on #8271, and found out this difference in the handling of > surrogates (only on 3.x they are invalid). > I didn't change the behavior of the codec in the patch I attached to #8271 > because it was out of the scope of the issue, but I consider the fact that in > Python 2.x surrogates can be encoded as a bug, because it doesn't follow RFC > 3629. > IMHO Python 2.x should provide an RFC-3629-compliant UTF-8 codec, however I > didn't have time yet to investigate how Python 3 handles this and what is the > best solution (e.g. adding another codec or change the default behavior).
We have good reasons to allow lone surrogates in the UTF-8 codec. Please remember that Python is a programming language meant to allow writing applications, which also includes constructing Unicode data from scratch, rather than an application which is only meant to work with UTF-8 data. Also note that lone surrogates were considered valid UTF-8 at the time of adding Unicode support to Python and many years after that. Since the codec is used in lots of applications, following the Unicode consortium change in 2.7 is not possible. This is why it was done in the 3.x branch and then only with the additional surrogatepass handler to get back the old behavior where needed. But this is getting offtopic for the issue in question... I'll open a new ticket for the backports. ---------- _______________________________________ Python tracker <[email protected]> <http://bugs.python.org/issue8438> _______________________________________ _______________________________________________ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
