| LGB>> One glyph that thakes 64 bits to encode... LGB> | But not for any *technical* purpose. For all purposes of string LGB> | processing, such as indexing, concatenation etc., this is *two* LGB> | characters, not one.
LGB> Finding the length of the string... Sorry, I don't understand. The length of the string U+0065 U+0301 certainly is 2, regardless of how the rendering engine displays this. Of course, the rendering engine should render it as "é" because U+0301 is a combining character, but the string length is still 2. A combining character is still a character. Do you think the length should be 1? But that's only on the glyph layer, not on the backing store layer. After all, this is the only way to get combinations like U+0648 U+064E (Arabic for "and", with U+064E being a combining character) to work: any Arab will understand this to be *two* characters, even though it's rendered quite one-glyph-like looking. LGB> I thought 31-bit... LGB> Just wait until they begin doing eastern languages for real. Oh, but they *are* doing them for real... there's some 75,000 characters encoded as of now, and some 55,000 of these are Chinese symbols.... On the Unicode list, they're pretty confident about 20 bits being sufficient. (And since they have people who actually *know* something about the languages they're encoding, so I don't really feel bad about sharing their confidence, even though I don't have a clue about chinese. :-)) I mean, this does somehow remind me of this 640 kB thingy, so with 32 bits one is on the safe side ;-) Cheers - Philipp Reichmuth mailto:[EMAIL PROTECTED] -- Server's poor response / Not quick enough for browser / Timed out, plum blossom