On Thu, Sep 06, 2007 at 12:37:37PM -0700, Larry Wall wrote:
> On Thu, Sep 06, 2007 at 01:25:12PM -0500, Patrick R. Michaud wrote:
> : On a somewhat similar question, what happens with a pattern
> : such as
> : 
> :     "foobar" ~~ / foo.+? | fooba /
> : 
> : The LHS initially matches "foob", but with backtracking could
> : eventually match "foobar".  Do the longest-token semantics
> : in this case cause the RHS to be dispatched first, even
> : though the token declaration of the LHS _could_ match a 
> : longer token prefix?  
> 
> Yow.  ICATBW.  Non-greedy matching is somewhat antithetical to
> longest-token matching.  

I agree.  One thought I had was that perhaps non-greedy matching
could also terminate the token prefix.

> [...]
> I think longest-token semantics have to trump minimal matching here,
> and my argument is this.  Most uses of *? have additional information
> on what terminates it, either implicitly in what it is matching, or
> explicitly in the next bit of regex.  That is, you'd typically see
> either
>     foo\w+? | fooba
> or
>     foo.+? <wb> | fooba
> 
> In either case, the clear intent is to match foobar over fooba.
> Therefore I think the DFA matcher just strips ? and does its ordinary
> character by character match, relying on that extra info to match
> the real extent of the quantifier.

Does this still hold true for a non-greedy quantifier in the
middle of an expression... ?  I.e.,

    "foobazbar deborah" ~~ /foo .+? b.r | fooba | foobazb /

matches "foobazbar debor" ?

(I completely grant that the examples I'm coming up with here
may be completely nonsensical in real application, but I'm
just exploring the space a bit.)

Pm

Reply via email to