I'm currently going through the various string functions and make them usable for all string encdodings we have. It's not finished yet, but a lot already works.

We have:

charsets:   binary, ascii, iso-8859-1, unicode
encodings:  fixed_8, utf8, utf16, ucs2

utf16 is a bit special, as it falls immediately back to ucs2, if there are no surrogates in the string.

The default charset is ascii.
The default encoding for (binary,ascii,iso-8859-1) is fixed_8
The default encoding for unicode is utf8.

String operations with unicode either return utf8 strings (concat utf8, ascii) or create utf16/ucs2 strings.

Therefore before a unicode string is sent to some output, it needs conversion to the desired encoding, possibly utf8. There are to ways to achieve this:

   getstdout P0      # get output handle - any ParrotIO PMC will do
   push P0, "utf8"   # push utf8 output filter  on layer stack
   # all output to P0 will now be utf8

or

   find_encoding  I0, "utf8"   # or any other valid encoding
   trans_encoding S1, I0       # S1 is now utf8

I hope these semantics are sane so far.

leo

Reply via email to