On Wed, Aug 28, 2002 at 10:36:54AM -0700, Larry Wall wrote:
> That is a worthy consideration, but expressiveness takes precedence
> over it in this case.  DFAs are really only good for telling you
> *whether* and *where* a pattern matches as a whole.  They are
> relatively useless for telling you *how* a pattern matches.

Actually, DFAs can also tell you *how* a pattern matches matches.  For
instance the RE library (http://www.sourceforge.net/projects/libre/)
is DFA-based and support a lot of Perl 5 regular expression features:
- submatches
- left-most matching semantics
- greedy and non-greedy operators (*, *?, ?, ??)
- zero-length assertions such as ^, $, \b.

I don't know how to implement back-references and embedded Perl code
using a DFA.  Look-ahead and look-behind assertion can probably be
implemented, though this looks tricky.  But, these features are not
used that often.  So, I believe than most "real-life" Perl 5 regular
expressions can be handled by a DFA.

> Add to that the fact that most real-life patterns don't generally do
> much backtracking, because they're written to succeed, not to fail.
> This pattern never backtracks, for instance:
> 
>     my ($num) = /^Items: (\d+)/;

Another advantage of DFAs over NFAs is that the core of a DFA-based
pattern matching implementation is a small simple loop which is
executed very efficiently by modern CPUs.  On the other hand, the core
of a NFA-based implementation is much larger and much more complex,
and I'm not sure it is executed as efficiently.  In particular, there
is probably a lot more branch mispredictions.

As an example of what performance improvement one can sometimes
achieve using a DFA-based rather than a NFA-based implementation,
here are the results I get on my computer for the regular expression
benchmark from the "Great Computer Language Shootout".
(http://www.bagley.org/~doug/shootout/bench/regexmatch/)

    Perl 5                              3.49s
    OCaml with PCRE (NFA-based)         3.17s
    OCaml with RE (DFA-based)           0.34s

-- Jerome

Reply via email to