I have a quick question about the Str type, described in Synopsis 2:

  Str     Perl string (finite sequence of Unicode characters)

Specifically, and partly in the interest in future-proofing, is there support in Str for representing codepoint numbers that are beyond the range currently described in the Unicode spec; eg, can someone validly say "\x[263a123456789]" and pass around said as a Str value?

Or would there potentially be language constraints to prevent such from compiling/executing?

I think it would be useful for the above to be allowed so that one could still encode future larger codepoints under an older Perl that doesn't attribute any meaning to them, and just falls back to treating the Str as a generic string of integers, that is what happens by default when you don't have special character tables handy AFAIK.

That's not to say you can't also have a stricter subtype defined, eg Uni5_1Str, which includes just the characters defined by Unicode version 5.1, where people want to use that.

So if Perl's Str is lax in this way I think it should be documented somewhere that a Str may contain a sequence of potential and not just actual Unicode characters. Or if that already is documented, please say where.

And I want to emphasize that I'm not proposing changing the logical/conceptual meaning of Str, it is still defined as a string of characters, not as a string of integers.

One reason I'm asking is that I wanted to make the Text type of my Muldis D language support arbitrarily large codepoints partly for future-proofing, and I'm hoping to be able to say that when mapping the language to Perl 6 that any Text value can be represented simply by a Perl 6 Str value. But if Perl 6's Str isn't likely to be that flexible then I'd like to know for my planning purposes.

Thank you. -- Darren Duncan

Reply via email to