On Thu, Mar 13, 2008 at 10:55 AM, erik quanstrom <[EMAIL PROTECTED]> wrote:
> plan 9 supports utf16.  that is codpoints u+0000 — u+fffff.

To be pedantic, UTF-16 has the ability to represent characters in the
'astral planes' via surrogate pairs (pairs of character in the range
U+D800–U+DFFF); Plan 9's charset is approximately UCS-2.

Java has the same trouble; its astral plane characters are first
encoded as UTF-16 surrogate pairs, then those 16-bit values are
encoded as UTF-8.

> to support larger characters, the starting point would be changing Rune
> from ushort to ulong and changing constants like UTFmax and fixing
> chartorune and runetochar.  (and finding all the places that assume that
> UTFmax really is 3.)
>
> it's all very doable, but it would be a very invasive change.

Not really, since only the 2²⁰+2¹⁶ values from 0–0x10FFFF are needed
and UTFmax only needs to go up to 4.  An advantage would be that
out-of-band symbols like EOF and yacc terminals could be represented
in the same data type as the characters

On the other hand, there are more useful bits of Unicode that are
unimplemented in Plan 9.  Mañana (as in /sys/doc/utf.{html,ps,pdf}
never did come.

--Joel

Reply via email to