Saul Spatz <saul.sp...@gmail.com> added the comment: >b'\xe0\x80'.decode('utf-8', 'replace') returns >one U+FFFD and not two. I >don't think that is right.
I think that one U+FFFD is correct. The on;y error is a premature end of data. On Thu, May 17, 2012 at 12:31 PM, Serhiy Storchaka <rep...@bugs.python.org>wrote: > > Serhiy Storchaka <storch...@gmail.com> added the comment: > > > The only issue left was about the number of U+FFFD generated with > invalid sequences in some cases. > > My last patch has extensive tests for this, so you could try to apply it > (or copy the tests) and see if they all pass. > > Tests fails, but I'm not sure that the tests are correct. > > b'\xe0\x00' raises 'unexpected end of data' and not 'invalid > continuation byte'. This is terminological issue. > > b'\xe0\x80'.decode('utf-8', 'replace') returns one U+FFFD and not two. I > don't think that is right. > > ---------- > title: str.decode('utf8', 'replace') -- conformance with Unicode > 5.2.0 -> str.decode('utf8', 'replace') -- conformance with Unicode 5.2.0 > > _______________________________________ > Python tracker <rep...@bugs.python.org> > <http://bugs.python.org/issue8271> > _______________________________________ > ---------- _______________________________________ Python tracker <rep...@bugs.python.org> <http://bugs.python.org/issue8271> _______________________________________ _______________________________________________ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com