Chris Angelico wrote: > As to the notion of rejecting the construction of strings containing > these invalid codepoints, I'm not sure. Are there any languages out > there that have a Unicode string type that requires that all > codepoints be valid (no surrogates, no U+FFFE, etc)?
U+FFFE and U+FFFF are *noncharacters*, not invalid. There are a total of 66 noncharacters in Unicode, and they are legal in strings. http://www.unicode.org/faq/private_use.html#nonchar8 I think the only illegal code points are surrogates. Surrogates should only appear as bytes in UTF-16 byte-strings. -- Steven -- https://mail.python.org/mailman/listinfo/python-list