On Tuesday 04 December 2007 08:14:41 Patrick R.Michaud wrote: > If ICU isn't present, Parrot's C<downcase> opcode always throws > an exception. It does this even if the string contains codepoints > only in the ascii and/or iso-8859-1 range. > > For example: > > $ cat x.pir > .sub main :main > $S0 = unicode:"hello world" > $S1 = downcase $S0 > say $S1 > .end > > $ ./parrot x.pir > no ICU lib loaded > current instr.: 'main' pc 3 (x.pir:3) > > This may cause a problem for Perl 6 programs, since the source > code is always read as Unicode, and particularly affects the > C< « > and C< » > characters (codepoints U+00ab and U+00bb). > > So far the major place I've run into this is in PGE, and I have > a workaround there [1], but it will certainly crop up in many > other places as we get more Perl 6 programs going. > > Pm > > [1] PGE only has to downcase a single character at a time, > so instead of doing "$S1 = downcase $S0" it can cheat with > > $I0 = ord $S0 > $S1 = chr $I0 > $S1 = downcase $S1 > > This works because chr with codepoints < 256 produces > strings as either ascii or iso-8859-1, and downcase can > work with that.
As a workaround (writing Unicode downcasing by hand in the absence of ICU is... tricky), can you convert the strings from Unicode to ISO-8859-1 with the trans_charset op? $I0 = find_charset 'iso-8859-1' $S0 = unicode:"Hello world" $S0 = trans_charset $S0, $I0 -- c