Re: greedy/non-greedy regex assertions

Larry Wall Thu, 04 Jul 2002 10:27:27 -0700

On Thu, 4 Jul 2002, Ashley Winters wrote:
: I was pondering how to implement the apocalypse 5 stuff (only pondering) and I 
: was wondering if <Inf,0> could be legal, indicating a greedy match.
: 
: * = <Inf,0>
: + = <Inf,1>
: ? = <1,0>
: *? = <0,Inf>
: +? = <1,Inf>
: ?? = <0,1>


We could autoreverse, but it'd be a bad idea.  It doesn't work out
well when you want to parameterize it or write a code generator.
We don't do it for ranges either, for similar reasons.

: Speaking of the range assertion, is there anything other than <x,y>? There 
: used to be discussion on the list about adding more possibilities, but I 
: didn't follow it.

We also allow <$x,$y>, <x,$y>, and <$x,y>, but that's about it.  There's some
talk about disambuguating from <$x> with a leading +:

    <+$x,$y>

So possibly we could make

    <-$y,$x>

count down instead.  But if we're gonna go all general on this, it'd be
better to do it the same way ranges do.

    <$y..$x:-1>

That's still ambiguous with <$x>, though.  Perhaps the general form is
just

    <* $start..$stop:$step >

That devolves nicely to

    <*2>

to match twice.

But there's one clinker in the works.  Currently

    {1,10}

is a maximal match, and you write

    {1,10}?

to get the minimal match.  But if

    <*$min..$max>

defaults to a step of 1, it's a minimal match, and we write the
maximal match with

    <*$max..$min:-1>

That goes against the historic precedent that the maximal match is
the unmarked form.  Much as I feel tempted to make minimal matching
the default across the board, I'm not sure we could pull that one off.
Too many neurons hardwired to think .* is greedy.

So I'd guess that we just don't talk about :-1, but rather say that

    <*$min..$max>

is naturally greedy, and as with any quantifier you write

    <*$min..$max>?

to get minimal matching.

But sigh, it would fix so many novice bugs to make minimal matching
the default...

Larry

Re: greedy/non-greedy regex assertions

Reply via email to