Re: pugs CGI.pm

mark . a . biggar Wed, 13 Apr 2005 10:33:54 -0700

No the bug is using chr() to convert the byte as it appears to be defined as 
taking a Unicode codepoint and returning a UTF-8 character (which will be 
multibyte if the arg is >127), not as taking an int and return an 8 bit char 
with the same value.  If this were perl 5, I'd say you really wanted to use 
pack instead.  We really need both conversion functions and chr() can't be both.


--
Mark Biggar
[EMAIL PROTECTED]
[EMAIL PROTECTED]
[EMAIL PROTECTED]


> Hi,
> 
> BÃRTHÃZI AndrÃ¡s wrote:
> > It's interesting, and it can be the problem, but I think, the CGI.pm
> > way is not the good solution to decode the URL encoded string: if you
> > say chr(0xE2)~chr(0x82)~chr(0xA2), then they are 3 characters, and
> 
> s:g/A2/AC/?
> 
> I think we've discovered a bug in Pugs, but as I don't know that much
> about UTF-8, I'd like to see the following confirmed first :).
>   # This is what *should* happen:
>   my $x = chr(0xE2)~chr(0x82)~chr(0xAC);
>   say $x.bytes;  # 3
>   say $x.chars;  # 1
> 
>   # This is what currently happens:
>   my $x = chr(0xE2)~chr(0x82)~chr(0xAC);
>   say $x.bytes;  # 6
>   say $x.chars;  # 3
> 
> Comparision with perl5:
>   $ perl -MEncode -we '
>     my $x = decode "utf-8", chr(0xE2).chr(0x82).chr(0xAC);
>     print length $x;
>   '
>   1 # (chars)
> 
>   $ perl -we '
>     my $x = chr(0xE2).chr(0x82).chr(0xAC);
>     print length $x;
>   '
>   3 # (bytes)
> 
> 
> --Ingo
> 
> -- 
> Linux, the choice of a GNU | The computer revolution is over. The
> generation on a dual AMD   | computers won. -- Eduard Bloch <[EMAIL 
> PROTECTED]>
> Athlon!                    | 
>

Re: pugs CGI.pm

Reply via email to