New submission from Walter Dörwald <wal...@livinglogic.de>: The following code issues a misleading exception message:
>>> b'\xed\xa0\xbd\xed\xb3\x9e'.decode("utf-8") Traceback (most recent call last): File "<stdin>", line 1, in <module> UnicodeDecodeError: 'utf-8' codec can't decode byte 0xed in position 0: invalid continuation byte The cause for the exception is *not* an invalid continuation byte, but UTF-8 encoded surrogates. In fact using the 'surrogatepass' error handler doesn't raise an exception: >>> b'\xed\xa0\xbd\xed\xb3\x9e'.decode("utf-8", "surrogatepass") '\ud83d\udcde' I would have expected an exception message like: UnicodeDecodeError: 'utf-8' codec can't decode bytes in position 0-2: surrogates not allowed (Note that the input bytes are an improperly UTF-8 encoded version of U+1F4DE (telephone receiver)) ---------- components: Unicode messages: 327357 nosy: doerwalter, ezio.melotti, vstinner priority: normal severity: normal status: open title: Misleading error message in str.decode() versions: Python 3.7 _______________________________________ Python tracker <rep...@bugs.python.org> <https://bugs.python.org/issue34935> _______________________________________ _______________________________________________ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com