On Fri, Sep 17, 2004 at 09:57:14AM -0500, Jonathan Scott Duff wrote: : Now for the bothersome parts and some questions and some suggestions in : no particular order: : : - for minimal matching the ? is too far away from the operator that it : applies to. It looks like it's doing something to the closure (and : maybe it is) Should that be [foo]**?{$m..$n} instead?
Yes, I felt that way too, and considered doing exactly what you suggest, but decided that it doesn't make sense to make odd syntactic exceptions for infrequently used constructs. We're trying to have few random exceptions in Perl 6 than in Perl 5. : - Must the closure take the exact form of stuff in curlies? What : would these do? : $c = sub { 0..5 }; : /[foo]**$c/; # error? : /[foo]**&somesub/; # error? Yes, those are not allowed. I considered doing that too, and rejected it for similar reasons. : - Is the rationale behind making [foo]**{1,3} illegal strictly to : catch the semantic error of those migrating from perl 5? Because it : certainly seems like it could be a useful thing otherwise. Right. The idea is that someday we could allow random lists, perhaps even immediately if people allow it by pragma, and if the regex engine actually supports it, which is not a sure thing. (On the other hand, it can actually be written now as an assertion on the number of matches of a previous $1, so there's no big pressure to make it work, and may never be enough pressure.) : - because the closure is executed first, you have to read ahead to the : end of the closure and then look back to see what you were : quantifying when trying to grok the code. This isn't such a big deal : if you just have a range, but it's a closure so all sorts of things : can be in there! Yes, that's a potential problem, just as you can do all sorts of stuff in the condition of a C<while> statement modifier. Cultural pressure will tend to work against that. : - Bringing a closure into the picture seems to put too much power in : such a simple construct. [foo]**{ destroy_the_world; 0... } No more power than closures anywhere else in the regex. No more power than plain old Perl outside the regex. I don't see why this is any kind of an issue at all. The mere possibility of obfuscation is not something Perl has ever been designed against. If anything, the opposite is true. Expressive power can be used either for good or ill, and Perl has generally opted for more potential goodness. : - I've always viewed the minimal matching ? as a kind of modifier on : either the quantifiers. If that illusion is to remain true in Perl6, : I'd want an optional colon [foo]*:? By that argument, the * is also a modifier and should have a colon. :-) : Whitespace would disambiguate the : "modifier colon" from the "no backtrack" or "cut" operator (it would : parse as [ foo ] * :? I also seem to recall already have a whitespace : disambiguation rule for ::). And if we apply this idea to the range : quantifier, that would give us something like these: : : [foo]*:5 # match exactly 5 times : [foo]*:{0...} # verbose [foo]* : [foo]*:{1...} # verbose [foo]+ : [foo]*:{1..5} # match from 1 to 5 times : [foo]*:{[1,3,5]} # match exactly 1, 3, or 5 times : [foo]*:[EMAIL PROTECTED] # treat each element of @foo as a : # number and only match that : # many times. (same as previous : # basically) : [foo]*:{&foo} # match based on the return value of &foo : [foo]*:{%foo} # ??? : : Those last few suddenly make me want junctioned ranges, though I : don't know what I'd use them for :) I see no simplifications here from the point of view of either the parser or the human. All I see are pitfalls. *: is rather ambiguous with existing constructs. ** is completely illegal, just as *? and +? were before we added the minimal modifier. Again, this is a seldom used feature, and doesn't deserve special lookahead rules to determine that the colon doesn't mean backtracking. Also, all the other :foo modifiers modify the things after them, not the things before them. : - An alternate syntax was proposed on IRC yesterday. I'm not sure if I : remember the specifics right, but the gist of it is to use a ~ : character to offset the ranges, so ... This feature is so completely not worth Yet Another Metacharacter. : On the whole, I liked the simplicity of the old <$m..$n> (or even : <$m,$n>) and would like something just like it only without the : ambiguity of <$m>. I'd even suggest <+$m> as a disambiguating mechanism : if we weren't using + and - for "character" classes. **{$m..$n} and **{$m} are precisely one character longer than <+$m..$n> and <+$m> you are advocating. They have the mnemonic value of * without the possibility of being confused with *. They have the right Huffman coding with respect to the common quantifiers. They don't visually pretend to be subrules when they're not, or entice people to try to turn them into captures. They indicate visually that Perl code is potentially being run by use of the braces. The Perl code cannot be confused with the ** outside. The ** outside cannot be confused with the Perl code inside. Best of all, there are no extra, additional, optional, special rules to explain. Larry