Re: [svn:perl6-synopsis] r14431 - doc/trunk/design/syn

Larry Wall Sat, 04 Aug 2007 12:57:26 -0700

On Sat, Aug 04, 2007 at 12:55:58PM -0500, Patrick R. Michaud wrote:
: On Thu, Aug 02, 2007 at 04:19:18PM -0700, [EMAIL PROTECTED] wrote:
: >  Increment of a C<Str> (in a suitable container) works similarly to
: >  Perl 5, but is generalized slightly.  First, the string is examined
: >  to see if it could be the string representation of a number in
: >  any common representation, including floating point and radix
: >  notation. (Surrounding whitespace is also allowed around such a
: >  number.)  If it appears to be a number, it is converted to a number
: >  and incremented as a number.  
: 
: Just for verification:  an increment of "0xff" will therefore
: result in 256 and not "0xfg".  Correct?


Correct.  Likewise ":16<ff>".  I'm only wondering whether we should
also include complex number representations here.  :)

I suppose one could argue that "0xff" should increment to "0x100"...

: >  final alphanumeric sequence in the string.  Unlike in Perl 5, this
: >  alphanumeric sequence need not be anchored to the beginning of the
: >  string, nor does it need to begin with an alphabetic character; the
: >  final sequence in the string matching C<\w+> is incremented regardless
: >  of what comes before it.  
: 
: ...does the \w+ include non-ASCII alphanumerics and underscore?  
: Or should the spec limit itself to [A-Za-z0-9]+ here?  If we
: include non-ASCII alphanumerics, then incrementing something like
: "résumé" produces "résumf" ?

Hmm, good point.  Could probably limit alphas to ASCII if we wanted
to be culturally insensitive, though we could easily include all the
contiguous Unicode digit ranges that go from 0 to 9.  Which, oddly,
doesn't include the numeric dingbats, which tend to start at 1, and if
there's a corresponding 0, it's not the codepoint before the 1.  I can
see an argument for allowing such characters to increment though:

    for '❶' .. '❿' { .say }

But it's not clear what to do if you try to increment ❿ though.
Probably just return a failure.

Or we could stick with \w+, which makes sense for various alphabets
like Greek and Hebrew, just let "résumé" turn into, not "résumf",
but rather "résumê", since the decrement should be the reverse of
the increment.

Except it's not really right for Greek, since the basic letters run
into other precomposed letters after omega.  Basically we'd need to
identify all wrappable alphabet ranges, which probably leaves out
all accented character, which means that "résumé" would turn into
"résuné" presumably.  Which basically means we'd need to define our
own character class for wrappable alphanumerics.  Possibly we could
define it algorithmically based on current Unicode data, but that
would tend to include the entire CJK area as one alphabet, which is
not going to make much sense to anyone, especially since most legacy
Asian fonts don't provide all the characters.  For now I'm just going
to hardcode the ranges in the spec, I think.  We'll also maybe have
to hardcode which ranges wrap and which ones don't, if we want to
allow incrementing numeric dingbats.

Which I think would be way cool, actually:

    for @points Z '⒈'  .. * -> $p, $n { say "$n\t$p" }

or for roman numerals, since the numeral is distinguished as a separate
character from the latin letter:

    for @points Z 'ⅰ'  .. * -> $p, $n { say "$n.\t$p" }

A sufficiently motivated person could make roman numerals work right
up to the limits of the notation, assuming we allow varying numbers
of characters.  But then it's not clear whether ⅹ should increment
to ⅺ or to ⅹⅰ.  (And yes, those are different.)  Of course, we
could also treat ⅲ as a precomposed ⅰⅰⅰ.  Not sure which way
that argues; it'd be kinda strange to use precomposed forms just for
this one purpose.  Also, there are only precomposed characters in the
digits range; there's no precomposed ⅽⅹⅹⅹ form, for instance.
They probably did procomposed up to twelve just for clocks, and
maybe because ⅰⅰⅰ looks too spread out in a monospace font.

If we make roman numerals increment, then I think that also argues
for making "0xff" stay a string too.  Basically a "0x" on the front
would pick 0..9a..f as the "alphabetic" range for the rest of it.

Arguably this could all be handled by a function that takes a random
string and converts it to a typed string with the appropriate .succ
and .pred methods.  Maybe an appropriate set of multis would be most
extensible.  Or a multi-token:

    multi token numrange:<0b>  (--> StrBinary) { '0b' <[0..1]>+ }
    multi token numrange:<0o>  (--> StrOctal)  { '0o' <[0..7]>+ }
    multi token numrange:<0d>  (--> StrDec)    { '0d' <[0..9]>+ }
    multi token numrange:<0x>  (--> StrHex)    { '0x' <[0..9a..fA..F]>+ }
    multi token numrange:roman (--> StrRoman)  { <[ Ⅰ .. ↂ  ]> }
    etc.

Maybe these are all just mixins of various Incremental roles.

That's probably more than enough speculation for now...

Larry

Re: [svn:perl6-synopsis] r14431 - doc/trunk/design/syn

Reply via email to