Hi,

BÃRTHÃZI AndrÃs wrote:
> It's interesting, and it can be the problem, but I think, the CGI.pm
> way is not the good solution to decode the URL encoded string: if you
> say chr(0xE2)~chr(0x82)~chr(0xA2), then they are 3 characters, and

s:g/A2/AC/?

I think we've discovered a bug in Pugs, but as I don't know that much
about UTF-8, I'd like to see the following confirmed first :).
  # This is what *should* happen:
  my $x = chr(0xE2)~chr(0x82)~chr(0xAC);
  say $x.bytes;  # 3
  say $x.chars;  # 1

  # This is what currently happens:
  my $x = chr(0xE2)~chr(0x82)~chr(0xAC);
  say $x.bytes;  # 6
  say $x.chars;  # 3

Comparision with perl5:
  $ perl -MEncode -we '
    my $x = decode "utf-8", chr(0xE2).chr(0x82).chr(0xAC);
    print length $x;
  '
  1 # (chars)

  $ perl -we '
    my $x = chr(0xE2).chr(0x82).chr(0xAC);
    print length $x;
  '
  3 # (bytes)


--Ingo

-- 
Linux, the choice of a GNU | The computer revolution is over. The
generation on a dual AMD   | computers won. -- Eduard Bloch <[EMAIL PROTECTED]>
Athlon!                    | 

Reply via email to