On Thu, Sep 06, 2007 at 01:25:12PM -0500, Patrick R. Michaud wrote: : > Were we using the procedural conjunction: : > : > "foobar" ~~ / <[a..z]>+ && [ ... ] /; : > : > I would guess that the LHS matches as much as it can ("foobar"), then : > the RHS matches "foo" [...and then backtracks the LHS until a : > conjunctional match is found...] : > : > Or it's much simpler than that and both of the regexes above just fail : > because of the greediness of C<+> and there is no intra-conjunction : > backtracking. : : I think we definitely allow intra-conjunction backtracking. : PGE implements it that way.
That's what I think. : On a somewhat similar question, what happens with a pattern : such as : : "foobar" ~~ / foo.+? | fooba / : : The LHS initially matches "foob", but with backtracking could : eventually match "foobar". Do the longest-token semantics : in this case cause the RHS to be dispatched first, even : though the token declaration of the LHS _could_ match a : longer token prefix? Yow. ICATBW. Non-greedy matching is somewhat antithetical to longest-token matching. But basically it boils down to this: Does the longest-token matcher ignore the ? and do foo.+ | fooba or is there an implicit ordering above and beyond the DFA engine of foob | fooba || fooba | fooba || foobar | fooba || I think longest-token semantics have to trump minimal matching here, and my argument is this. Most uses of *? have additional information on what terminates it, either implicitly in what it is matching, or explicitly in the next bit of regex. That is, you'd typically see either foo\w+? | fooba or foo.+? <wb> | fooba In either case, the clear intent is to match foobar over fooba. Therefore I think the DFA matcher just strips ? and does its ordinary character by character match, relying on that extra info to match the real extent of the quantifier. Larry