Russ Allbery <[EMAIL PROTECTED]> writes:

> That's probably unnecessary; I really don't expect them to ever use all
> 31 bytes that the IETF-standardized version of UTF-8 supports.

31 bits, rather.  *sigh*

But given that, modulo some debate over CJKV, we're getting into *really*
obscure stuff already at only 94,140 characters, I'm guessing that there
would have to be some really major and fundamental changes in written
human communication before something more than two billion characters are
used.  Which doesn't mean rule out the possibility of ever expanding,
since one should always leave that option open, but expending coding
effort on it isn't worth it.  Particularly since extending UTF-8 to more
than 31 bits requires breaking some of the guarantees that UTF-8 makes,
unless I'm missing how you're encoding the first byte so as not to give it
a value of 0xFE.

-- 
Russ Allbery ([EMAIL PROTECTED])             <http://www.eyrie.org/~eagle/>

Reply via email to