On Mon, May 19, 2008 at 10:29:29AM -0700, Stephane Payrard wrote:
> But the following test fails. I pasted the content of the literal
> string with a character that emacs says to be #x8a0
> 
> > my $s = " "; say  $s.chars  # $s == "\x8a0"
> 2
> 
> I expected one.

Because Parrot's primary support for unicode is utf-8 encoding,
and because utf-8 greatly slows down parsing of long strings
(such as program source code), we've elected for the time being 
to have rakudo use "fixed8" for its default input encoding.  When 
Parrot becomes faster at processing unicode strings, we'll likely
switch the default to utf8.(*)

This doesn't mean that unicode can't be used in rakudo programs,
though.  One can always encode the character explicitly:

    $ ./parrot perl6.pbc
    > my $s = "€"; say $s.chars;          # doesn't work
    3
    > my $s = "\x20ac"; say $s.chars;     # works
    1

Also, rakudo understands the --encoding=utf8 option to specify that
the source code is coming in as UTF-8:

    $ ./parrot perl6.pbc --encoding=utf8
    > my $s = "€"; say $s.chars;          # works
    1

For now I'll mark this ticket as "stalled", awaiting faster Parrot
unicode support or a decision that we're going to live with
slower parsing of source code.

Thanks!

Pm

(*) Another option we might have could be to default to utf8 and 
transcode to ucs2 on platforms that have ICU present (which can be 
faster), but stay at a fixed8 default for systems without ICU.
But at this stage I think consistency and explicit options are
better, otherwise people will be confused as to why a particular
program works on some systems but not others.

Reply via email to