>>> b"\xed\xb4\x80".decode() Traceback (most recent call last): File "<stdin>", line 1, in <module> UnicodeDecodeError: 'utf-8' codec can't decode byte 0xed in position 0: invalid continuation byte
But 0xED is not a continuation byte, it's a start byte. And it's a perfectly valid one: >>> b"\xed\x9f\xbf".decode() '\ud7ff' Pike is more explicit about what the problem is: > utf8_to_string("\xed\xb4\x80"); UTF-8 sequence beginning with 0xed 0xb4 at index 0 would decode to a UTF-16 surrogate character. Is this something where Python's error message could do with improvement, or is it not worth the hassle? Should I raise a tracker issue about this? ChrisA -- https://mail.python.org/mailman/listinfo/python-list