| LGB>> One glyph that thakes 64 bits to encode...

LGB> | But not for any *technical* purpose. For all purposes of string
LGB> | processing, such as indexing, concatenation etc., this is *two*
LGB> | characters, not one.

LGB> Finding the length of the string...

Sorry, I don't understand. The length of the string U+0065 U+0301
certainly is 2, regardless of how the rendering engine displays this.
Of course, the rendering engine should render it as "é" because U+0301
is a combining character, but the string length is still 2. A
combining character is still a character. Do you think the length
should be 1? But that's only on the glyph layer, not on the backing
store layer. After all, this is the only way to get combinations like
U+0648 U+064E (Arabic for "and", with U+064E being a combining
character) to work: any Arab will understand this to be *two*
characters, even though it's rendered quite one-glyph-like looking.

LGB> I thought 31-bit...

LGB> Just wait until they begin doing eastern languages for real.

Oh, but they *are* doing them for real... there's some 75,000
characters encoded as of now, and some 55,000 of these are Chinese
symbols.... On the Unicode list, they're pretty confident about 20
bits being sufficient. (And since they have people who actually *know*
something about the languages they're encoding, so I don't really feel
bad about sharing their confidence, even though I don't have a clue
about chinese. :-)) I mean, this does somehow remind me of this 640 kB
thingy, so with 32 bits one is on the safe side ;-)

Cheers -
  Philipp Reichmuth                            mailto:[EMAIL PROTECTED]

--
Server's poor response / Not quick enough for browser / Timed out, plum blossom

Reply via email to