No the bug is using chr() to convert the byte as it appears to be defined as taking a Unicode codepoint and returning a UTF-8 character (which will be multibyte if the arg is >127), not as taking an int and return an 8 bit char with the same value. If this were perl 5, I'd say you really wanted to use pack instead. We really need both conversion functions and chr() can't be both.
-- Mark Biggar [EMAIL PROTECTED] [EMAIL PROTECTED] [EMAIL PROTECTED] > Hi, > > BÁRTHÁZI András wrote: > > It's interesting, and it can be the problem, but I think, the CGI.pm > > way is not the good solution to decode the URL encoded string: if you > > say chr(0xE2)~chr(0x82)~chr(0xA2), then they are 3 characters, and > > s:g/A2/AC/? > > I think we've discovered a bug in Pugs, but as I don't know that much > about UTF-8, I'd like to see the following confirmed first :). > # This is what *should* happen: > my $x = chr(0xE2)~chr(0x82)~chr(0xAC); > say $x.bytes; # 3 > say $x.chars; # 1 > > # This is what currently happens: > my $x = chr(0xE2)~chr(0x82)~chr(0xAC); > say $x.bytes; # 6 > say $x.chars; # 3 > > Comparision with perl5: > $ perl -MEncode -we ' > my $x = decode "utf-8", chr(0xE2).chr(0x82).chr(0xAC); > print length $x; > ' > 1 # (chars) > > $ perl -we ' > my $x = chr(0xE2).chr(0x82).chr(0xAC); > print length $x; > ' > 3 # (bytes) > > > --Ingo > > -- > Linux, the choice of a GNU | The computer revolution is over. The > generation on a dual AMD | computers won. -- Eduard Bloch <[EMAIL > PROTECTED]> > Athlon! | >