[SAtalk] Re: A faster and more scalable matching engine.

Scott A Crosby Wed, 26 Nov 2003 16:38:16 -0800

On Wed, 26 Nov 2003 12:58:19 -0500, Kris Deugau <[EMAIL PROTECTED]> writes:

> Pedro Sam wrote:
> > hehe, we can give a state to each possible combination of HITS for the
> > rules.  So if rules 1, 3, 5, 7 hit, we give that a state, and if 2, 4,
> > 6, 8 hit, we give it another state, and so on...  I think they call it
> > subset construction or something...
> 
> Which is next to useless for even 30-40 rules.  You would have to have
> n![1] states for n rules- there is NO way to determine which individual
> rules will trigger in advance.

In the worst case, precomputing the transition table for the DFA is
exponential with only one regexp. In practice, most regexps people
care about are much more well behaved, so the result seems to be about
linear in the sums of the lengths of all of the regexps.

Even for ill-formed rules, the table can also be constructed lazily at
runtime. See the dragon book for details on both algorithms. 

To keep track of all matching regexps, I annotate the DFA states with
not one rule for backtracking, but a list of all matching rules. Then
log them during interpretation and generate a union of all of those.

Scott

-------------------------------------------------------
This SF.net email is sponsored by: SF.net Giveback Program.
Does SourceForge.net help you be more productive?  Does it
help you create better code?  SHARE THE LOVE, and help us help
YOU!  Click Here: http://sourceforge.net/donate/
_______________________________________________
Spamassassin-talk mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/spamassassin-talk

[SAtalk] Re: A faster and more scalable matching engine.

Reply via email to