Re: Opaque error message on UTF-8 decode

Mark Lawrence Sun, 08 Mar 2015 14:26:20 -0700

On 08/03/2015 21:15, Chris Angelico wrote:

b"\xed\xb4\x80".decode()

Traceback (most recent call last):
   File "<stdin>", line 1, in <module>
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xed in position
0: invalid continuation byte


But 0xED is not a continuation byte, it's a start byte. And it's a
perfectly valid one:

b"\xed\x9f\xbf".decode()

'\ud7ff'

Pike is more explicit about what the problem is:

utf8_to_string("\xed\xb4\x80");

UTF-8 sequence beginning with 0xed 0xb4 at index 0 would decode to a
UTF-16 surrogate character.

Is this something where Python's error message could do with
improvement, or is it not worth the hassle? Should I raise a tracker
issue about this?

ChrisA

I'd raise an issue so there's a formal record that we can refer to inthe future. Besides what's one issue like this compared to the "Pythoncan't do decimal sums properly" which gets raised every few months bynewbies :)


--
My fellow Pythonistas, ask not what our language can do for you, ask
what you can do for our language.

Mark Lawrence

--
https://mail.python.org/mailman/listinfo/python-list

Re: Opaque error message on UTF-8 decode

Reply via email to