[issue8271] str.decode('utf8', 'replace') -- conformance with Unicode 5.2.0

Serhiy Storchaka Thu, 17 May 2012 11:46:09 -0700

Serhiy Storchaka <storch...@gmail.com> added the comment:

> This might be just because it first checks if there two more bytes before 
> checking if they are valid, but 'invalid continuation byte' works too.


Yes, this implementation detail. It is much easier and faster. Whether
it is necessary to change it?

> Why not?

May be I'm wrong. I looked in "The Unicode Standard, Version
6.0" (http://www.unicode.org/versions/Unicode6.0.0/ch03.pdf), pp. 95-97,
the standard does not categorical in this, but recommends that only
maximal subpart should be replaced by U+FFFD. \xe0\x80 is not maximal
subpart. Therefore, there must be two U+FFFD. In this case, the previous
and the current implementation does not conform to the standard.

----------

_______________________________________
Python tracker <rep...@bugs.python.org>
<http://bugs.python.org/issue8271>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue8271] str.decode('utf8', 'replace') -- conformance with Unicode 5.2.0

Reply via email to