On Thu, Mar 13, 2008 at 10:55 AM, erik quanstrom <[EMAIL PROTECTED]> wrote: > plan 9 supports utf16. that is codpoints u+0000 — u+fffff.
To be pedantic, UTF-16 has the ability to represent characters in the 'astral planes' via surrogate pairs (pairs of character in the range U+D800–U+DFFF); Plan 9's charset is approximately UCS-2. Java has the same trouble; its astral plane characters are first encoded as UTF-16 surrogate pairs, then those 16-bit values are encoded as UTF-8. > to support larger characters, the starting point would be changing Rune > from ushort to ulong and changing constants like UTFmax and fixing > chartorune and runetochar. (and finding all the places that assume that > UTFmax really is 3.) > > it's all very doable, but it would be a very invasive change. Not really, since only the 2²⁰+2¹⁶ values from 0–0x10FFFF are needed and UTFmax only needs to go up to 4. An advantage would be that out-of-band symbols like EOF and yacc terminals could be represented in the same data type as the characters On the other hand, there are more useful bits of Unicode that are unimplemented in Plan 9. Mañana (as in /sys/doc/utf.{html,ps,pdf} never did come. --Joel