S5: range quantifier woes

Jonathan Scott Duff Fri, 17 Sep 2004 07:57:32 -0700

The new range quantifier syntax has been bothering me.  For reference,
here's the bit of S5 that talks about it:


> The repetition specifier is now **{...} for maximal matching, with a
> corresponding or **{...}? for minimal matching. Space is allowed on
> either side of the asterisks. The curlies are taken to be a closure
> returning a number or a range.
> 
>    / value was (\d ** {1..6}?) with ([\w]**{$m..$n}) /
> 
> It is illegal to return a list, so this easy mistake fails:
> 
>     / [foo]**{1,3}

Now for the bothersome parts and some questions and some suggestions in
no particular order:

- for minimal matching the ? is too far away from the operator that it
  applies to. It looks like it's doing something to the closure (and
  maybe it is) Should that be [foo]**?{$m..$n} instead?
- Must the closure take the exact form of stuff in curlies?  What
  would these do?  
        $c = sub { 0..5 };  
        /[foo]**$c/;                    # error?
        /[foo]**&somesub/;              # error?
- Is the rationale behind making [foo]**{1,3} illegal strictly to
  catch the semantic error of those migrating from perl 5? Because it
  certainly seems like it could be a useful thing otherwise.
- because the closure is executed first, you have to read ahead to the
  end of the closure and then look back to see what you were
  quantifying when trying to grok the code. This isn't such a big deal
  if you just have a range, but it's a closure so all sorts of things
  can be in there!
- Bringing a closure into the picture seems to put too much power in
  such a simple construct.  [foo]**{ destroy_the_world; 0... }
- I've always viewed the minimal matching ? as a kind of modifier on
  either the quantifiers. If that illusion is to remain true in Perl6,
  I'd want an optional colon [foo]*:? Whitespace would disambiguate the
  "modifier colon" from the "no backtrack" or "cut" operator (it would
  parse as [ foo ] * :? I also seem to recall already have a whitespace
  disambiguation rule for ::). And if we apply this idea to the range
  quantifier, that would give us something like these:

        [foo]*:5                # match exactly 5 times
        [foo]*:{0...}           # verbose [foo]*
        [foo]*:{1...}           # verbose [foo]+
        [foo]*:{1..5}           # match from 1 to 5 times
        [foo]*:{[1,3,5]}        # match exactly 1, 3, or 5 times
        [foo]*:[EMAIL PROTECTED]                # treat each element of @foo as a
                                # number and only match that
                                # many times. (same as previous
                                # basically)
        [foo]*:{&foo}           # match based on the return value of &foo
        [foo]*:{%foo}           # ???

  Those last few suddenly make me want junctioned ranges, though I
  don't know what I'd use them for :)

- An alternate syntax was proposed on IRC yesterday. I'm not sure if I
  remember the specifics right, but the gist of it is to use a ~
  character to offset the ranges, so ...

        [foo]~5                 # match exactly 5 times
        [foo]~{0...}            # verbose [foo]*
        [foo]~{1...}            # verbose [foo]+
        [foo]~{1..5}            # match from 1 to 5 times
        [foo]~{[1,3,5]}         # match exactly 1, 3, or 5 times
        [EMAIL PROTECTED]               # treat each element of @foo as a
                                # number and only match that
                                # many times. (same as previous
                                # basically)
        [foo]~{&foo}            # match based on the return value of &foo
        [foo]~{%foo}            # ???

  And surely these can be made to work:

        [foo]~[0...]            # [foo]:[0...]
        [foo]~[1,3,5]           # [foo]:[1,3,5]
        [EMAIL PROTECTED]               # [foo]:@foo

Yes, I realize that the "bag" variants (e.g., /[foo]*:[EMAIL PROTECTED]/) could be
nightmarish for optimization (e.g. you can't assume monotonically
increasing values) And would "minimal match" mean stop when you've
reached the first number in the list or do you have to evaluate the
whole thing and literally find the minimum value? (Similar reasoning
and questions apply for the regular greedy version) These may be really
good arguments for not including that particular variant, but I don't
know that :-)

----

On the whole, I liked the simplicity of the old <$m..$n> (or even
<$m,$n>) and would like something just like it only without the
ambiguity of <$m>. I'd even suggest <+$m> as a disambiguating mechanism
if we weren't using + and - for "character" classes.

-Scott
-- 
Jonathan Scott Duff
[EMAIL PROTECTED]

S5: range quantifier woes

Reply via email to