Andras,
I am CC-ing this to perl6-compiler in hopes that smarter people that I can better answer this question.
On Apr 13, 2005, at 9:20 AM, BÁRTHÁZI András wrote:
I'm trying to create a small web application, and hacking parameter handling now.
As Pugs works in UTF-8, my page is coded in UTF-8, too (and there are some other reasons, too). When I try to send an accented charater to the server as parameter, for example the euro character, I get back an UTF-8 coded character:
...?test=%E2%82%AC
It's OK, but when my code (and CGI.pm as well) try to decode it, it will give back three characters and not just one.
The problem is with this line in sub url_decode():
$decoded ~~ s:perl5:g/%([\da-fA-F][\da-fA-F])/{chr(hex($1))}/;
Have any idea, how to solve it? I think I should transform this code to recognize multi-bytes, decode the character value, and after it use chr on this value. Or is there a way to do it by not creating character by chr(), but a byte with another function?
To be honest, my experience with multi-byte character sets is very limited (my first real exposure is on the Pugs project). However, I think/hope that maybe the chr() builtin will eventually be able to handle multi-bytes itself. In the (non-working) port of CGI-Lite (http://tpe.freepan.org/repos/iblech/CGI-Lite/lib/CGI/Lite.pm), I saw code which did this:
/%(<[\da-fA-F]>**{2})/{chr :16($1)}/
Of course it was followed by this comment "# XXX -- correct?" so it may not be anything official yet.
That is my best guess (and it is not very good), so I will leave you in the capable hands of the perl6-compiler crew.
- Stevan