Hi,
>> This code:
>>
>> my $a='A';
>> $a ~~ s:perl5:g/A/{chr(65535)}/;
>> say $a.bytes;
>>
>> Outputs "0". Why?
>
>
> \uFFFF is not a legal unicode codepoint. chr(65535) should raise an exception of some type. So the above code does seem show a possible bug. But as that chr(65535) is an undefined char, who knows what the code is acually doing.
In my opinion (that can be wrong), \uFFFF can be stored as an UTF-8 character, it should be 0xEF~0xBF~0xBF. If I do it outside the regexp (I mean "say chr(65535).bytes", it works well.
Another "bug", I've found, it's not related to the regexps, but still unicode character one:
say chr(0x10FFFF).bytes;
The answer:
pugs: encodeUTF8: ord returned a value above 0x10FFFF
And if I start to increment $b, I will get:
pugs: Prelude.chr: bad argument
I don't understand it, as I thougth that unicode characters in the range of 0x00000000-0x7FFFFFFF. Is Haskell not supporting the whole set?
There is a Unicode version, called UCS-2, that is just between 0x0000-0xFFFF, but it still not answer the question.
[...]
Meanwhile, I've found this: http://std.dkuug.dk/jtc1/sc2/wg2/docs/n2175.htm
It can be the answer to my question.
Bye, Andras