Ezio Melotti <ezio.melo...@gmail.com> added the comment: > \xe0\x80 is not maximal subpart. Therefore, there must be two U+FFFD.
OK, now I get what you mean. The valid range for continuation bytes that can follow E0 is A0-BF, not 80-BF as usual, so \x80 is not a valid continuation byte here. While working on the patch I stumbled across this corner case and contacted the Unicode consortium to ask about it, as explained in msg129495. I don't remember all the details right now, but it that test was passing with my patch there must be something wrong somewhere (either in the patch, in the test, or in our understanding of the standard). ---------- _______________________________________ Python tracker <rep...@bugs.python.org> <http://bugs.python.org/issue8271> _______________________________________ _______________________________________________ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com