Re: pugs CGI.pm

2005-04-14 Thread Roie Marianer
On Wednesday 13 April 2005 9:23 pm, Ingo Blechschmidt wrote: > Ok, then it seems we need to have a builtin, such that: > new_builtin(0xE2) ~ new_builtin(0x82) ~ new_builtin(0xAC) eq > "\xE2\x82\xAC" Doesn't it make more sense for a decode_utf8 function such that decode_utf8(0xE2, 0x82, 0xAC) e

Re: pugs CGI.pm

2005-04-13 Thread Ovid
--- Stevan Little <[EMAIL PROTECTED]> wrote: > Andras, > > On Apr 13, 2005, at 3:34 PM, BÁRTHÁZI András wrote: > > So, then here's a solution: > > http://barthazi.hu/decode.pugs > > > > It wasn't heavily tested (euro sign, all the Hungarian letters and > > some other works), but I think it can w

Re: pugs CGI.pm

2005-04-13 Thread Stevan Little
Andras, On Apr 13, 2005, at 3:34 PM, BÁRTHÁZI András wrote: So, then here's a solution: http://barthazi.hu/decode.pugs It wasn't heavily tested (euro sign, all the Hungarian letters and some other works), but I think it can work in all possible situations. Let me start by saying this would be an e

Re: pugs CGI.pm

2005-04-13 Thread BÁRTHÁZI András
Hi, ah! That makes perfect sense, thanks for clarifying matters! :) Ok, then it seems we need to have a builtin, such that: new_builtin(0xE2) ~ new_builtin(0x82) ~ new_builtin(0xAC) eq "\xE2\x82\xAC" I think - conceptually - it cannot be done, because you cannot store a byte in a character str

Re: pugs CGI.pm

2005-04-13 Thread Dan Sugalski
At 8:51 PM +0200 4/13/05, BÁRTHÁZI András wrote: Hi, ah! That makes perfect sense, thanks for clarifying matters! :) Ok, then it seems we need to have a builtin, such that: new_builtin(0xE2) ~ new_builtin(0x82) ~ new_builtin(0xAC) eq "\xE2\x82\xAC" I think - conceptually - it cannot be done, b

Re: pugs CGI.pm

2005-04-13 Thread Jonathan Scott Duff
On Wed, Apr 13, 2005 at 08:23:17PM +0200, Ingo Blechschmidt wrote: > ah! That makes perfect sense, thanks for clarifying matters! :) > > Ok, then it seems we need to have a builtin, such that: > new_builtin(0xE2) ~ new_builtin(0x82) ~ new_builtin(0xAC) eq > "\xE2\x82\xAC" Hmm. Looks like you'

Re: pugs CGI.pm

2005-04-13 Thread BÁRTHÁZI András
Hi, ah! That makes perfect sense, thanks for clarifying matters! :) Ok, then it seems we need to have a builtin, such that: new_builtin(0xE2) ~ new_builtin(0x82) ~ new_builtin(0xAC) eq "\xE2\x82\xAC" I think - conceptually - it cannot be done, because you cannot store a byte in a character str

Re: pugs CGI.pm

2005-04-13 Thread BÁRTHÁZI András
Hi, So in the regex we have to determine whether we are unencoding a single-byte or multi-byte character. read in a single byte and pass it to chr(). I do not have enough experience with multi-byte characters to know when a byte can be recognized as the first byte of a multi-byte character, and t

Re: pugs CGI.pm

2005-04-13 Thread Ingo Blechschmidt
Hi, Roie Marianer wrote: >> # This is what *should* happen: >> my $x = chr(0xE2)~chr(0x82)~chr(0xAC); >> say $x.bytes; # 3 >> say $x.chars; # 1 >> >> # This is what currently happens: >> my $x = chr(0xE2)~chr(0x82)~chr(0xAC); >> say $x.bytes; # 6 >> say $x.chars; # 3 > > That

Re: pugs CGI.pm

2005-04-13 Thread BÁRTHÁZI András
Hi, It's interesting, and it can be the problem, but I think, the CGI.pm way is not the good solution to decode the URL encoded string: if you say chr(0xE2)~chr(0x82)~chr(0xA2), then they are 3 characters, and s:g/A2/AC/? Yes, don't care with it. At first, I would like to tell you, that I'm not the

Re: pugs CGI.pm

2005-04-13 Thread mark . a . biggar
No the bug is using chr() to convert the byte as it appears to be defined as taking a Unicode codepoint and returning a UTF-8 character (which will be multibyte if the arg is >127), not as taking an int and return an 8 bit char with the same value. If this were perl 5, I'd say you really wanted

Re: pugs CGI.pm

2005-04-13 Thread Roie Marianer
> I think we've discovered a bug in Pugs, but as I don't know that much > about UTF-8, I'd like to see the following confirmed first :). > # This is what *should* happen: > my $x = chr(0xE2)~chr(0x82)~chr(0xAC); > say $x.bytes; # 3 > say $x.chars; # 1 > > # This is what currently happen

Re: pugs CGI.pm

2005-04-13 Thread Nathan Gray
On Wed, Apr 13, 2005 at 09:52:41AM -0400, Stevan Little wrote: > On Apr 13, 2005, at 9:20 AM, BÁRTHÁZI András wrote: > >As Pugs works in UTF-8, my page is coded in UTF-8, too (and there are > >some other reasons, too). When I try to send an accented charater to > >the server as parameter, for exa

Re: pugs CGI.pm

2005-04-13 Thread Ingo Blechschmidt
Hi, BÃRTHÃZI AndrÃs wrote: > It's interesting, and it can be the problem, but I think, the CGI.pm > way is not the good solution to decode the URL encoded string: if you > say chr(0xE2)~chr(0x82)~chr(0xA2), then they are 3 characters, and s:g/A2/AC/? I think we've discovered a bug in Pugs, but a

Re: pugs CGI.pm

2005-04-13 Thread mark . a . biggar
The standard for URLs uses a double encoding: A URL is coded in UTF-8 and then all bytes with high bits set are written in the %xx format. Therefore, if you just convert each %xx to the proper byte, the result is a valid UTF-8 string. You don't need to worry about multi-byte codes, if UTF-8 is

Re: pugs CGI.pm

2005-04-13 Thread BÃRTHÃZI AndrÃs
Hi! the "XXX -- correct" refers to the :16 (IIRC, Larry said on p6l that he liked that, but I wasn't able to find it in the Synopses). BTW, Pugs' chr does understand input > 255 correctly: pugs> ord "â" 8364 pugs> chr 8364 'â' Yes, I know it. $decoded does contain valid UTF-8, the problem i

Re: pugs CGI.pm

2005-04-13 Thread Ingo Blechschmidt
Hi, Stevan Little wrote: > On Apr 13, 2005, at 9:20 AM, BÃRTHÃZI AndrÃs wrote: >> The problem is with this line in sub url_decode(): >> >> $decoded ~~ s:perl5:g/%([\da-fA-F][\da-fA-F])/{chr(hex($1))}/; >> >> Have any idea, how to solve it? I think I should transform this code >> to recognize mult

Re: pugs CGI.pm

2005-04-13 Thread Stevan Little
Andras, I am CC-ing this to perl6-compiler in hopes that smarter people that I can better answer this question. On Apr 13, 2005, at 9:20 AM, BÁRTHÁZI András wrote: I'm trying to create a small web application, and hacking parameter handling now. As Pugs works in UTF-8, my page is coded in UTF-8