On Thu, Sep 06, 2007 at 12:37:37PM -0700, Larry Wall wrote: > On Thu, Sep 06, 2007 at 01:25:12PM -0500, Patrick R. Michaud wrote: > : On a somewhat similar question, what happens with a pattern > : such as > : > : "foobar" ~~ / foo.+? | fooba / > : > : The LHS initially matches "foob", but with backtracking could > : eventually match "foobar". Do the longest-token semantics > : in this case cause the RHS to be dispatched first, even > : though the token declaration of the LHS _could_ match a > : longer token prefix? > > Yow. ICATBW. Non-greedy matching is somewhat antithetical to > longest-token matching.
I agree. One thought I had was that perhaps non-greedy matching could also terminate the token prefix. > [...] > I think longest-token semantics have to trump minimal matching here, > and my argument is this. Most uses of *? have additional information > on what terminates it, either implicitly in what it is matching, or > explicitly in the next bit of regex. That is, you'd typically see > either > foo\w+? | fooba > or > foo.+? <wb> | fooba > > In either case, the clear intent is to match foobar over fooba. > Therefore I think the DFA matcher just strips ? and does its ordinary > character by character match, relying on that extra info to match > the real extent of the quantifier. Does this still hold true for a non-greedy quantifier in the middle of an expression... ? I.e., "foobazbar deborah" ~~ /foo .+? b.r | fooba | foobazb / matches "foobazbar debor" ? (I completely grant that the examples I'm coming up with here may be completely nonsensical in real application, but I'm just exploring the space a bit.) Pm