"Mark A. Biggar" <[EMAIL PROTECTED]> wrote: :B�RTH�ZI Andr�s wrote: : :> Hi, :> :> This code: :> :> my $a='A'; :> $a ~~ s:perl5:g/A/{chr(65535)}/; :> say $a.bytes; :> :> Outputs "0". Why? :> :> Bye, :> Andras :> : :\uFFFF is not a legal unicode codepoint. chr(65535) should raise an :exception of some type. So the above code does seem show a possible :bug. But as that chr(65535) is an undefined char, who knows what the :code is acually doing.
In perl5 at least, we support a wider concept of codepoints than the Unicode consortium. This allows us to use strings for a wider variety of things than just Unicode text (eg version strings, bit vectors etc). In perl6 the greatly expanded set of types will presumably allow us to distinguish actual Unicode data from more arbitrary sequences of codepoints, and I'd normally expect that the more constrained type would be a subtype of the less constrained type. In this case that means I'd expect "Unicode string" to be a subtype of something like "codepoint sequence". (In fact it'd probably be useful to have more levels than that - there are times when you need the Unicode concepts for things like [[:digit:]], but may be able to get better performance by avoiding the checks for 'legal Unicode codepoint'.) On the other hand you will probably be able to achieve the things p5 overloads onto strings using packed integer arrays, so maybe this all represents unnecessary complications. In which case maybe 'relaxed' variants of Unicode strings aren't needed. We will probably still want other sorts of strings though, such as ASCII. Hugo