On Jun 14, 2004, at 1:54 PM, Dan Sugalski wrote:Parrot provides code points for all graphemes, even for those character sets/encodings which don't inherently do so. Most sets that have variable-length encodings use an escape sequence scheme--the value of the first byte in a character determines whether the grapheme is a one or more byte sequence. When parrot turns these into code points it does it by building up the final value. The first byte is put in the low 8 bits of the integer. If there's a second byte in the sequence the current value is shifted left 8 bits and the new byte is stuffed in the low 8 bits. If there's a third byte in the sequence everything is shifted left again 8 bits and that third byte is stuffed in the bottom, and so on.
A grapheme consists of one or more code points. Is "provides code points for all graphemes" really what is intended here?
D'oh! No, that's not intended at all. I was sloppy with the search&replace on this. Good catch--thanks.
--
Dan
--------------------------------------it's like this------------------- Dan Sugalski even samurai [EMAIL PROTECTED] have teddy bears and even teddy bears get drunk