On Wed, 28 Aug 2002, Dan Sugalski wrote: > At 10:57 AM -0400 8/28/02, Deven T. Corzine wrote: > >Would it be _possible_ to create a non-backtracking implementation of a > >Perl 6 pattern engine, or does the existence of backtracking-related > >operators preclude this possibility in advance? > > In general, no of course it's not possible to create a > non-backtracking perl regex engine. Far too much of perl's regexes > requires backtracking.
Given that Perl 5 regex's are no longer regular (much less Perl 6), I'm sure this is probably true. There may be a regular subset which could be implemented without backtracking if problematic features are avoided, but surely a complete non-backtracking implementation is beyond reach. On the other hand, :, ::, ::: and <commit> don't necessarily need to be a problem if they can be treated as hints that can be ignored. If allowing the normal engine to backtrack despite the hints would change the results, that might be a problem. I don't know; <cut> may pose special problems. Even if the new operators can't work without backtracking, maybe it doesn't matter, since there's surely a few others inherited from Perl 5 as well... > That doesn't mean you can't write one for a specific subset of perl's > regexes, though. A medium-term goal for the regex engine is to note > where a DFA would give correct behaviour and use one, rather than > going through the more expensive generalized regex engine we'd > otherwise use. I think this is a more realistic goal, and more or less what I had in mind. I believe there are many subpatterns which might be beneficial to compile to a DFA (or DFA-like) form, where runtime performance is important. For example, if a pattern is matching dates, a (Jan|Feb|Mar|Apr|...) subpattern would be more efficient to implement as a DFA than with backtracking. With a large amount of data to process, that represent significant savings... > If you want to head over to [EMAIL PROTECTED] and pitch in on > the regex implementation (it's being worked on now) that'd be great. I'd like to do that, if I can find the time. It would be interesting to make a small experimental prototype to see if DFA construction could really improve performance over backtracking, but it would probably need to be a very restricted subset of regex operations to test the idea... However, while I'm still on perl6-language, I have two language issues to discuss first: (1) Can we have a ":study" modifier in Perl 6 for patterns? It could be a no-op if necessary, but it could take the place of Perl 5's "study" operator and indicate that the programmer WANTS the pattern optimized for maximum runtime speed, even at the cost of compile time or memory. (Hey, how about a ":cram" modifier for extreme optimization? :-) (2) Would simple alternation impede DFA-style optimizations? Currently, a pattern like (Jun|June) would never match "June" because the "leftmost" match "Jun" would always take precedence, despite the normal "longest-match" behavior of regexes in general. This example could be implemented in a DFA; would that always be the case? Would it be better for the matching of (Jun|June) to be "undefined" and implementation-dependent? Or is it best to require "leftmost" semantics? Deven