I'm currently going through the various string functions and make them
usable for all string encdodings we have. It's not finished yet, but a
lot already works.
We have:
charsets: binary, ascii, iso-8859-1, unicode
encodings: fixed_8, utf8, utf16, ucs2
utf16 is a bit special, as it falls immediately back to ucs2, if there
are no surrogates in the string.
The default charset is ascii.
The default encoding for (binary,ascii,iso-8859-1) is fixed_8
The default encoding for unicode is utf8.
String operations with unicode either return utf8 strings (concat utf8,
ascii) or create utf16/ucs2 strings.
Therefore before a unicode string is sent to some output, it needs
conversion to the desired encoding, possibly utf8. There are to ways to
achieve this:
getstdout P0 # get output handle - any ParrotIO PMC will do
push P0, "utf8" # push utf8 output filter on layer stack
# all output to P0 will now be utf8
or
find_encoding I0, "utf8" # or any other valid encoding
trans_encoding S1, I0 # S1 is now utf8
I hope these semantics are sane so far.
leo