Bart Lateur <[EMAIL PROTECTED]> writes:
> On 05 Jun 2001 11:07:11 -0700, Russ Allbery wrote:
>> Particularly since part of his contention is that 16 bits isn't enough,
>> and I think all the widely used national character sets are no more
>> than 16 bits, aren't they?
> It's not really important.
Well, it is for trying to understand what his point is. I realize that
Unicode is four bytes (insert handwaving here -- this is not an exact
statement); that's not what I was getting at.
> UTF-8 is NOT limited to 16 bits (3 bytes).
That's an odd definition of byte you have there. :)
> With 4 bytes, UTF-8 can represent 20 bit charatcers, i.e. 6 times more
> than the "desired number" of 170000.
UTF-8 is a mapping from a 31-bit (yes, not 32, interestingly enough)
character numbering, and as such can represent over two billion
characters. For some reason that I've never understood, the Unicode folks
are limiting that to only a subset of what one can do with 31 bits by
putting an artificial limit on how high of character values they're
willing to assign, but even with that as soon as they started using the
higher planes, there's easily enough space to add every character the
author mentioned and then some.
(As an aside, UTF-8 also is not an X-byte encoding; UTF-8 is a variable
byte encoding, with each character taking up anywhere from one to six
bytes in the encoded form depending on where in Unicode the character
falls.)
--
Russ Allbery ([EMAIL PROTECTED]) <http://www.eyrie.org/~eagle/>