Dan Sugalski <[EMAIL PROTECTED]> wrote:

> Synthesized code points
>=======================

> Parrot provides code points for all graphemes, even for those
> character sets/encodings which don't inherently do so. Most sets that
> have variable-length encodings use an escape sequence scheme--the
> value of the first byte in a character determines whether the
> grapheme is a one or more byte sequence.

Doing so would need that Parrot has initimate knowledge of the encoding.
OTOH you are writing that we don't convert in the first place. Seems to
be a contradiction.

> (u)getstring     Sw, Sx, Iy, Iz

> (u)setstring     Sw, Sx, Iy, Iz

Does that mean that the current C<substr> opcodes get tossed?

> encoding Ix, Sy
> charset  Ix, Sy

How do we enumerate encodings and charsets? ICU's ucnv interface takes
an "encoding name".

leo

Reply via email to