Chris Angelico <ros...@gmail.com>: > Once again, you appear to be surprised that invalid data is failing. > Why is this so strange? U+DD00 is not a valid character. It is quite > correct to throw this error.
'\udd00' is a valid str object: >>> '\udd00' '\udd00' >>> '\udd00'.encode('utf-32') b'\xff\xfe\x00\x00\x00\xdd\x00\x00' >>> '\udd00'.encode('utf-16') b'\xff\xfe\x00\xdd' I was simply stating that UTF-8 is not a bijection between unicode strings and octet strings (even forgetting Python). Enriching Unicode with 128 surrogates (U+DC80..U+DCFF) establishes a bijection, but not without side effects. Marko -- https://mail.python.org/mailman/listinfo/python-list