[issue8271] str.decode('utf8', 'replace') -- conformance with Unicode 5.2.0

Saul Spatz Thu, 17 May 2012 10:36:27 -0700

Saul Spatz <[email protected]> added the comment:

>b'\xe0\x80'.decode('utf-8', 'replace') returns >one U+FFFD and not two. I
>don't think that is right.


I think that one U+FFFD is correct.  The on;y error is a premature end of
data.
On Thu, May 17, 2012 at 12:31 PM, Serhiy Storchaka
<[email protected]>wrote:

>
> Serhiy Storchaka <[email protected]> added the comment:
>
> > The only issue left was about the number of U+FFFD generated with
> invalid sequences in some cases.
> > My last patch has extensive tests for this, so you could try to apply it
> (or copy the tests) and see if they all pass.
>
> Tests fails, but I'm not sure that the tests are correct.
>
> b'\xe0\x00' raises 'unexpected end of data' and not 'invalid
> continuation byte'. This is terminological issue.
>
> b'\xe0\x80'.decode('utf-8', 'replace') returns one U+FFFD and not two. I
> don't think that is right.
>
> ----------
> title: str.decode('utf8',       'replace') -- conformance with Unicode
> 5.2.0 -> str.decode('utf8', 'replace') -- conformance with Unicode 5.2.0
>
> _______________________________________
> Python tracker <[email protected]>
> <http://bugs.python.org/issue8271>
> _______________________________________
>

----------

_______________________________________
Python tracker <[email protected]>
<http://bugs.python.org/issue8271>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue8271] str.decode('utf8', 'replace') -- conformance with Unicode 5.2.0

Reply via email to