On Thu, 11 Dec 2003 07:31:10 -0800, "Gary Funck" <[EMAIL PROTECTED]> writes:

> > The major catch with this particular implementation is that it cannot
> > deal with nondeterministic transformations. What this means is that
> > any consequent for a substitute rule must be a single character.  ( '4
> > -> for' would be bad) Thats not something that I think is going to be
> > a real problem in practice. Another problem is that with a few good
> > transforming rulesets, you've just increased the regexp ruleset
> > 5x. The matching engine has to support that without even more of a
> > resource hog. This would be a problem for the perl regexp engine that
> > SA uses, but not for an automata based matcher like what I have been
> > proposing and implementing.

> One implementation might be to convert the rewrite rules into an
> equivalent flex description, and let flex generate the automaton in
> C. Compile the C, and build a Perl binding to it.

I considered that and did a prototype (which was useful for
performance estimates and automata-size estimates) before deciding
that it won't work well. In particular, this can't handle overlapping
matches or cases where more than one rule matches.

It also blows up with an idiom used all over SA 'foo.{,20}bar' This
can be worked around by two rules:

foo   { mark current position as a match for foo }
bar   { check if current position is within 20 of marked position for foo and record a 
rule match if so.}

which combined with rules like 'foo.{,20}baz' 'bang.{,20}bar',
etc. just excaberates the problem of flex not supporting multiple
matches or overlapping matches.

Scott


-------------------------------------------------------
This SF.net email is sponsored by: IBM Linux Tutorials.
Become an expert in LINUX or just sharpen your skills.  Sign up for IBM's
Free Linux Tutorials.  Learn everything from the bash shell to sys admin.
Click now! http://ads.osdn.com/?ad_id=1278&alloc_id=3371&op=click
_______________________________________________
Spamassassin-talk mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/spamassassin-talk

Reply via email to