Wow. Since you went to the trouble of writing all this up, it really
ought to go in a FAQ somewhere.

On Thu, Aug 08, 2002 at 12:05:00AM -0400, Mark J. Reed wrote:
> Finite state machines can match regular expressions whose only operations
> are closure (*), alternation (|), and grouping.  Some of the other things

Don't forget optional subexpressions (?). Not sure what the official
name is.

They can also handle negation, but nobody ever seems to put that into
the regex syntax. Probably because the exact semantics get fuzzy when
you're not anchored at the beginning and end, and any particular
semantics you might pick are deceptive and/or useless.

> You can turn an nondeterministic FA into a deterministic one (a DFA)
> by a fairly straightforward algorithm.  Basically, each state in the
> DFA corresponds to a set of all the possible states the NFA *could*
> be in at a given point in a string.  So the number of states in
> a DFA built from an NFA can be huge; but the machine never has to
> backtrack and try again because it knows exactly which transition
> to make at each step.  So in general, NFAs take less memory but
> run slower, while DFAs run faster but take more memory.  

And if you implement the NFA the way you're implying (as a traditional
NFA), you'll always get the same answer to whether or not the
expression matches, but assuming the answer is yes the NFA and DFA
might match different parts of the string. Which doesn't contradict
anything you're saying, but it's good to be aware that the theoretical
purpose for these things (deciding whether a given string matches an
expression or not) is not the same as the common practical purpose
(finding the exact substring within a given string that matches the
expression).

> (Historically, the grep program has used an NFA, while egrep has
> used a DFA.)

That sounds backwards.

You should probably also mention somewhere that once you break free of
the bounds of regularity, it's much easier to implement many features
and irregular constructs with an NFA rather than a DFA. Like capturing
parentheses, alternation that prefers the choice the expression gives
first, etc.

> A step up from regular languages  are context-free languages, which can
> count. C-F languages can be recognized by a variety of different
> mechanisms, but not by FSMs.

Anyone happen to know where pushdown automata fit in this list? Can
they handle context-sensitive, just context-free, or some other
subset?

Reply via email to