On Wed, Aug 28, 2002 at 10:36:54AM -0700, Larry Wall wrote: > That is a worthy consideration, but expressiveness takes precedence > over it in this case. DFAs are really only good for telling you > *whether* and *where* a pattern matches as a whole. They are > relatively useless for telling you *how* a pattern matches.
Actually, DFAs can also tell you *how* a pattern matches matches. For instance the RE library (http://www.sourceforge.net/projects/libre/) is DFA-based and support a lot of Perl 5 regular expression features: - submatches - left-most matching semantics - greedy and non-greedy operators (*, *?, ?, ??) - zero-length assertions such as ^, $, \b. I don't know how to implement back-references and embedded Perl code using a DFA. Look-ahead and look-behind assertion can probably be implemented, though this looks tricky. But, these features are not used that often. So, I believe than most "real-life" Perl 5 regular expressions can be handled by a DFA. > Add to that the fact that most real-life patterns don't generally do > much backtracking, because they're written to succeed, not to fail. > This pattern never backtracks, for instance: > > my ($num) = /^Items: (\d+)/; Another advantage of DFAs over NFAs is that the core of a DFA-based pattern matching implementation is a small simple loop which is executed very efficiently by modern CPUs. On the other hand, the core of a NFA-based implementation is much larger and much more complex, and I'm not sure it is executed as efficiently. In particular, there is probably a lot more branch mispredictions. As an example of what performance improvement one can sometimes achieve using a DFA-based rather than a NFA-based implementation, here are the results I get on my computer for the regular expression benchmark from the "Great Computer Language Shootout". (http://www.bagley.org/~doug/shootout/bench/regexmatch/) Perl 5 3.49s OCaml with PCRE (NFA-based) 3.17s OCaml with RE (DFA-based) 0.34s -- Jerome