On Wednesday 13 April 2005 9:23 pm, Ingo Blechschmidt wrote:
> Ok, then it seems we need to have a builtin, such that:
> new_builtin(0xE2) ~ new_builtin(0x82) ~ new_builtin(0xAC) eq
> "\xE2\x82\xAC"
Doesn't it make more sense for a decode_utf8 function such that
decode_utf8(0xE2, 0x82, 0xAC) e
--- Stevan Little <[EMAIL PROTECTED]> wrote:
> Andras,
>
> On Apr 13, 2005, at 3:34 PM, BÁRTHÁZI András wrote:
> > So, then here's a solution:
> > http://barthazi.hu/decode.pugs
> >
> > It wasn't heavily tested (euro sign, all the Hungarian letters and
> > some other works), but I think it can w
Andras,
On Apr 13, 2005, at 3:34 PM, BÁRTHÁZI András wrote:
So, then here's a solution:
http://barthazi.hu/decode.pugs
It wasn't heavily tested (euro sign, all the Hungarian letters and
some other works), but I think it can work in all possible situations.
Let me start by saying this would be an e
Hi,
ah! That makes perfect sense, thanks for clarifying matters! :)
Ok, then it seems we need to have a builtin, such that:
new_builtin(0xE2) ~ new_builtin(0x82) ~ new_builtin(0xAC) eq
"\xE2\x82\xAC"
I think - conceptually - it cannot be done, because you cannot store a
byte in a character str
At 8:51 PM +0200 4/13/05, BÁRTHÁZI András wrote:
Hi,
ah! That makes perfect sense, thanks for clarifying matters! :)
Ok, then it seems we need to have a builtin, such that:
new_builtin(0xE2) ~ new_builtin(0x82) ~ new_builtin(0xAC) eq
"\xE2\x82\xAC"
I think - conceptually - it cannot be done,
b
On Wed, Apr 13, 2005 at 08:23:17PM +0200, Ingo Blechschmidt wrote:
> ah! That makes perfect sense, thanks for clarifying matters! :)
>
> Ok, then it seems we need to have a builtin, such that:
> new_builtin(0xE2) ~ new_builtin(0x82) ~ new_builtin(0xAC) eq
> "\xE2\x82\xAC"
Hmm. Looks like you'
Hi,
ah! That makes perfect sense, thanks for clarifying matters! :)
Ok, then it seems we need to have a builtin, such that:
new_builtin(0xE2) ~ new_builtin(0x82) ~ new_builtin(0xAC) eq
"\xE2\x82\xAC"
I think - conceptually - it cannot be done, because you cannot store a
byte in a character str
Hi,
So in the regex we have to determine whether we are unencoding a
single-byte or multi-byte character.
read in a single byte and pass it to chr(). I do not have enough
experience with multi-byte characters to know when a byte can be
recognized as the first byte of a multi-byte character, and t
Hi,
Roie Marianer wrote:
>> # This is what *should* happen:
>> my $x = chr(0xE2)~chr(0x82)~chr(0xAC);
>> say $x.bytes; # 3
>> say $x.chars; # 1
>>
>> # This is what currently happens:
>> my $x = chr(0xE2)~chr(0x82)~chr(0xAC);
>> say $x.bytes; # 6
>> say $x.chars; # 3
>
> That
Hi,
It's interesting, and it can be the problem, but I think, the CGI.pm
way is not the good solution to decode the URL encoded string: if you
say chr(0xE2)~chr(0x82)~chr(0xA2), then they are 3 characters, and
s:g/A2/AC/?
Yes, don't care with it.
At first, I would like to tell you, that I'm not the
No the bug is using chr() to convert the byte as it appears to be defined as
taking a Unicode codepoint and returning a UTF-8 character (which will be
multibyte if the arg is >127), not as taking an int and return an 8 bit char
with the same value. If this were perl 5, I'd say you really wanted
> I think we've discovered a bug in Pugs, but as I don't know that much
> about UTF-8, I'd like to see the following confirmed first :).
> # This is what *should* happen:
> my $x = chr(0xE2)~chr(0x82)~chr(0xAC);
> say $x.bytes; # 3
> say $x.chars; # 1
>
> # This is what currently happen
On Wed, Apr 13, 2005 at 09:52:41AM -0400, Stevan Little wrote:
> On Apr 13, 2005, at 9:20 AM, BÁRTHÁZI András wrote:
> >As Pugs works in UTF-8, my page is coded in UTF-8, too (and there are
> >some other reasons, too). When I try to send an accented charater to
> >the server as parameter, for exa
Hi,
BÃRTHÃZI AndrÃs wrote:
> It's interesting, and it can be the problem, but I think, the CGI.pm
> way is not the good solution to decode the URL encoded string: if you
> say chr(0xE2)~chr(0x82)~chr(0xA2), then they are 3 characters, and
s:g/A2/AC/?
I think we've discovered a bug in Pugs, but a
The standard for URLs uses a double encoding: A URL is coded in UTF-8 and then
all bytes with high bits set are written in the %xx format. Therefore, if you
just convert each %xx to the proper byte, the result is a valid UTF-8 string.
You don't need to worry about multi-byte codes, if UTF-8 is
Hi!
the "XXX -- correct" refers to the :16 (IIRC, Larry said on p6l that he
liked that, but I wasn't able to find it in the Synopses).
BTW, Pugs' chr does understand input > 255 correctly:
pugs> ord "â"
8364
pugs> chr 8364
'â'
Yes, I know it.
$decoded does contain valid UTF-8, the problem i
Hi,
Stevan Little wrote:
> On Apr 13, 2005, at 9:20 AM, BÃRTHÃZI AndrÃs wrote:
>> The problem is with this line in sub url_decode():
>>
>> $decoded ~~ s:perl5:g/%([\da-fA-F][\da-fA-F])/{chr(hex($1))}/;
>>
>> Have any idea, how to solve it? I think I should transform this code
>> to recognize mult
Andras,
I am CC-ing this to perl6-compiler in hopes that smarter people that I
can better answer this question.
On Apr 13, 2005, at 9:20 AM, BÁRTHÁZI András wrote:
I'm trying to create a small web application, and hacking parameter
handling now.
As Pugs works in UTF-8, my page is coded in UTF-8
18 matches
Mail list logo