On Thu, Sep 06, 2007 at 12:37:37PM -0700, Larry Wall wrote:
> On Thu, Sep 06, 2007 at 01:25:12PM -0500, Patrick R. Michaud wrote:
> : On a somewhat similar question, what happens with a pattern
> : such as
> :
> : "foobar" ~~ / foo.+? | fooba /
> :
> : The LHS initially matches "foob", but with backtracking could
> : eventually match "foobar". Do the longest-token semantics
> : in this case cause the RHS to be dispatched first, even
> : though the token declaration of the LHS _could_ match a
> : longer token prefix?
>
> Yow. ICATBW. Non-greedy matching is somewhat antithetical to
> longest-token matching.
I agree. One thought I had was that perhaps non-greedy matching
could also terminate the token prefix.
> [...]
> I think longest-token semantics have to trump minimal matching here,
> and my argument is this. Most uses of *? have additional information
> on what terminates it, either implicitly in what it is matching, or
> explicitly in the next bit of regex. That is, you'd typically see
> either
> foo\w+? | fooba
> or
> foo.+? <wb> | fooba
>
> In either case, the clear intent is to match foobar over fooba.
> Therefore I think the DFA matcher just strips ? and does its ordinary
> character by character match, relying on that extra info to match
> the real extent of the quantifier.
Does this still hold true for a non-greedy quantifier in the
middle of an expression... ? I.e.,
"foobazbar deborah" ~~ /foo .+? b.r | fooba | foobazb /
matches "foobazbar debor" ?
(I completely grant that the examples I'm coming up with here
may be completely nonsensical in real application, but I'm
just exploring the space a bit.)
Pm