At 4:04 PM +0200 6/15/04, Leopold Toetsch wrote:
Dan Sugalski <[EMAIL PROTECTED]> wrote:

 Synthesized code points
=======================

 Parrot provides code points for all graphemes, even for those
 character sets/encodings which don't inherently do so. Most sets that
 have variable-length encodings use an escape sequence scheme--the
 value of the first byte in a character determines whether the
 grapheme is a one or more byte sequence.

Doing so would need that Parrot has initimate knowledge of the encoding.

Yes. There'll be an encoding vtable hanging off the strings.

OTOH you are writing that we don't convert in the first place. Seems to
be a contradiction.

Nope. The encoding of a string can change, the same way the type of a PMC can change.


 > (u)getstring     Sw, Sx, Iy, Iz

(u)setstring Sw, Sx, Iy, Iz

Does that mean that the current C<substr> opcodes get tossed?

Probably, or get aliased.

> encoding Ix, Sy
charset Ix, Sy

How do we enumerate encodings and charsets? ICU's ucnv interface takes an "encoding name".

I think they're going to be arbitrary names, which load in loadable libraries. The encodings and charsets mostly have well-defined names as it is--it's not like there are too many variants of "Latin-1" kicking around. :)
--
Dan


--------------------------------------it's like this-------------------
Dan Sugalski                          even samurai
[EMAIL PROTECTED]                         have teddy bears and even
                                      teddy bears get drunk

Reply via email to