On Fri, Jan 25, 2013 at 3:15 PM, Costello, Roger L. <[email protected]> wrote: > I learned that the range from D800 to DFFF is reserved because it is used to > create variable-length UTF-16 strings. > > Thus, there are no codepoints assigned to the range D800 to DFFF in UTF-16. > > Does that mean there are no codepoints assigned to the range D800 to DFFF in > UTF-8 and UTF-32? I assume that's the case, but just want to check to be sure. >
Code points are assigned in the Unicode code point space, not in the encodings. All the UTF encodings share the same codepoint space. Because D800-DFFF cannot be encoded in UTF-16, those codepoints are reserved and will not have characters assigned to them. Each encoding has rules for encoding each code point value. The rules for UTF-8 and UTF-32 *could* be extended to encode the values in D800-DFFF, but those values do not appear unless something goes wrong, because there are no characters assigned to them. So the Unicode standard says that these values cannot be encoded in UTF-8 and UTF-32. Mit freundlichen Grüßen, Martinho

