about the Str type and Unicode

Darren Duncan Thu, 12 Mar 2009 17:28:12 -0700

I have a quick question about the Str type, described in Synopsis 2:


  Str     Perl string (finite sequence of Unicode characters)

Specifically, and partly in the interest in future-proofing, is there support inStr for representing codepoint numbers that are beyond the range currentlydescribed in the Unicode spec; eg, can someone validly say "\x[263a123456789]"and pass around said as a Str value?

Or would there potentially be language constraints to prevent such fromcompiling/executing?

I think it would be useful for the above to be allowed so that one could stillencode future larger codepoints under an older Perl that doesn't attribute anymeaning to them, and just falls back to treating the Str as a generic string ofintegers, that is what happens by default when you don't have special charactertables handy AFAIK.

That's not to say you can't also have a stricter subtype defined, eg Uni5_1Str,which includes just the characters defined by Unicode version 5.1, where peoplewant to use that.

So if Perl's Str is lax in this way I think it should be documented somewherethat a Str may contain a sequence of potential and not just actual Unicodecharacters. Or if that already is documented, please say where.

And I want to emphasize that I'm not proposing changing the logical/conceptualmeaning of Str, it is still defined as a string of characters, not as a stringof integers.

One reason I'm asking is that I wanted to make the Text type of my Muldis Dlanguage support arbitrarily large codepoints partly for future-proofing, andI'm hoping to be able to say that when mapping the language to Perl 6 that anyText value can be represented simply by a Perl 6 Str value. But if Perl 6's Strisn't likely to be that flexible then I'd like to know for my planning purposes.


Thank you. -- Darren Duncan

about the Str type and Unicode

Reply via email to