Serhiy Storchaka <storch...@gmail.com> added the comment: > This might be just because it first checks if there two more bytes before > checking if they are valid, but 'invalid continuation byte' works too.
Yes, this implementation detail. It is much easier and faster. Whether it is necessary to change it? > Why not? May be I'm wrong. I looked in "The Unicode Standard, Version 6.0" (http://www.unicode.org/versions/Unicode6.0.0/ch03.pdf), pp. 95-97, the standard does not categorical in this, but recommends that only maximal subpart should be replaced by U+FFFD. \xe0\x80 is not maximal subpart. Therefore, there must be two U+FFFD. In this case, the previous and the current implementation does not conform to the standard. ---------- _______________________________________ Python tracker <rep...@bugs.python.org> <http://bugs.python.org/issue8271> _______________________________________ _______________________________________________ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com