Opaque error message on UTF-8 decode

Chris Angelico Sun, 08 Mar 2015 14:18:10 -0700

>>> b"\xed\xb4\x80".decode()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xed in position
0: invalid continuation byte


But 0xED is not a continuation byte, it's a start byte. And it's a
perfectly valid one:

>>> b"\xed\x9f\xbf".decode()
'\ud7ff'

Pike is more explicit about what the problem is:

> utf8_to_string("\xed\xb4\x80");
UTF-8 sequence beginning with 0xed 0xb4 at index 0 would decode to a
UTF-16 surrogate character.

Is this something where Python's error message could do with
improvement, or is it not worth the hassle? Should I raise a tracker
issue about this?

ChrisA
-- 
https://mail.python.org/mailman/listinfo/python-list

Opaque error message on UTF-8 decode

Reply via email to