On Mon, May 19, 2008 at 10:29:29AM -0700, Stephane Payrard wrote: > But the following test fails. I pasted the content of the literal > string with a character that emacs says to be #x8a0 > > > my $s = " "; say $s.chars # $s == "\x8a0" > 2 > > I expected one.
Because Parrot's primary support for unicode is utf-8 encoding, and because utf-8 greatly slows down parsing of long strings (such as program source code), we've elected for the time being to have rakudo use "fixed8" for its default input encoding. When Parrot becomes faster at processing unicode strings, we'll likely switch the default to utf8.(*) This doesn't mean that unicode can't be used in rakudo programs, though. One can always encode the character explicitly: $ ./parrot perl6.pbc > my $s = "€"; say $s.chars; # doesn't work 3 > my $s = "\x20ac"; say $s.chars; # works 1 Also, rakudo understands the --encoding=utf8 option to specify that the source code is coming in as UTF-8: $ ./parrot perl6.pbc --encoding=utf8 > my $s = "€"; say $s.chars; # works 1 For now I'll mark this ticket as "stalled", awaiting faster Parrot unicode support or a decision that we're going to live with slower parsing of source code. Thanks! Pm (*) Another option we might have could be to default to utf8 and transcode to ucs2 on platforms that have ICU present (which can be faster), but stay at a fixed8 default for systems without ICU. But at this stage I think consistency and explicit options are better, otherwise people will be confused as to why a particular program works on some systems but not others.