On Sat, Aug 04, 2007 at 12:55:58PM -0500, Patrick R. Michaud wrote: : On Thu, Aug 02, 2007 at 04:19:18PM -0700, [EMAIL PROTECTED] wrote: : > Increment of a C<Str> (in a suitable container) works similarly to : > Perl 5, but is generalized slightly. First, the string is examined : > to see if it could be the string representation of a number in : > any common representation, including floating point and radix : > notation. (Surrounding whitespace is also allowed around such a : > number.) If it appears to be a number, it is converted to a number : > and incremented as a number. : : Just for verification: an increment of "0xff" will therefore : result in 256 and not "0xfg". Correct?
Correct. Likewise ":16<ff>". I'm only wondering whether we should also include complex number representations here. :) I suppose one could argue that "0xff" should increment to "0x100"... : > final alphanumeric sequence in the string. Unlike in Perl 5, this : > alphanumeric sequence need not be anchored to the beginning of the : > string, nor does it need to begin with an alphabetic character; the : > final sequence in the string matching C<\w+> is incremented regardless : > of what comes before it. : : ...does the \w+ include non-ASCII alphanumerics and underscore? : Or should the spec limit itself to [A-Za-z0-9]+ here? If we : include non-ASCII alphanumerics, then incrementing something like : "résumé" produces "résumf" ? Hmm, good point. Could probably limit alphas to ASCII if we wanted to be culturally insensitive, though we could easily include all the contiguous Unicode digit ranges that go from 0 to 9. Which, oddly, doesn't include the numeric dingbats, which tend to start at 1, and if there's a corresponding 0, it's not the codepoint before the 1. I can see an argument for allowing such characters to increment though: for '❶' .. '❿' { .say } But it's not clear what to do if you try to increment ❿ though. Probably just return a failure. Or we could stick with \w+, which makes sense for various alphabets like Greek and Hebrew, just let "résumé" turn into, not "résumf", but rather "résumê", since the decrement should be the reverse of the increment. Except it's not really right for Greek, since the basic letters run into other precomposed letters after omega. Basically we'd need to identify all wrappable alphabet ranges, which probably leaves out all accented character, which means that "résumé" would turn into "résuné" presumably. Which basically means we'd need to define our own character class for wrappable alphanumerics. Possibly we could define it algorithmically based on current Unicode data, but that would tend to include the entire CJK area as one alphabet, which is not going to make much sense to anyone, especially since most legacy Asian fonts don't provide all the characters. For now I'm just going to hardcode the ranges in the spec, I think. We'll also maybe have to hardcode which ranges wrap and which ones don't, if we want to allow incrementing numeric dingbats. Which I think would be way cool, actually: for @points Z '⒈' .. * -> $p, $n { say "$n\t$p" } or for roman numerals, since the numeral is distinguished as a separate character from the latin letter: for @points Z 'ⅰ' .. * -> $p, $n { say "$n.\t$p" } A sufficiently motivated person could make roman numerals work right up to the limits of the notation, assuming we allow varying numbers of characters. But then it's not clear whether ⅹ should increment to ⅺ or to ⅹⅰ. (And yes, those are different.) Of course, we could also treat ⅲ as a precomposed ⅰⅰⅰ. Not sure which way that argues; it'd be kinda strange to use precomposed forms just for this one purpose. Also, there are only precomposed characters in the digits range; there's no precomposed ⅽⅹⅹⅹ form, for instance. They probably did procomposed up to twelve just for clocks, and maybe because ⅰⅰⅰ looks too spread out in a monospace font. If we make roman numerals increment, then I think that also argues for making "0xff" stay a string too. Basically a "0x" on the front would pick 0..9a..f as the "alphabetic" range for the rest of it. Arguably this could all be handled by a function that takes a random string and converts it to a typed string with the appropriate .succ and .pred methods. Maybe an appropriate set of multis would be most extensible. Or a multi-token: multi token numrange:<0b> (--> StrBinary) { '0b' <[0..1]>+ } multi token numrange:<0o> (--> StrOctal) { '0o' <[0..7]>+ } multi token numrange:<0d> (--> StrDec) { '0d' <[0..9]>+ } multi token numrange:<0x> (--> StrHex) { '0x' <[0..9a..fA..F]>+ } multi token numrange:roman (--> StrRoman) { <[ Ⅰ .. ↂ ]> } etc. Maybe these are all just mixins of various Incremental roles. That's probably more than enough speculation for now... Larry