> 
> > One implementation might be to convert the rewrite rules into an
> > equivalent flex description, and let flex generate the automaton in
> > C. Compile the C, and build a Perl binding to it.

Scott replied:
> I considered that and did a prototype (which was useful for
> performance estimates and automata-size estimates) before deciding
> that it won't work well. In particular, this can't handle overlapping
> matches or cases where more than one rule matches.
>

OK. In your ealier example it looked as if your goal was to focus on
rewrite rules which took a stream in and made simple transformations
to produce a new (unobfuscated) output stream. But from above it seems
that you would like to rewrite existing SA scoring patterns into some
sort of automaton that will traverse the input stream in a more/less
single pass, outputing a list of rules that were matched?

Note that if you want to handle cases where more than one rule matches,
you'll likely have to implement backtracking. Or in limited situations
you can expand the alternatives at their point of invocation, but this
might lead to a combinatorial explosion in the size of the automaton.
 
> It also blows up with an idiom used all over SA 'foo.{,20}bar' This
> can be worked around by two rules:
> 
> foo   { mark current position as a match for foo }
> bar   { check if current position is within 20 of marked position 
> for foo and record a rule match if so.}
> 

A problem with that approach is the automaton might keep scanning
all the way to the end of the input stream rather than bounding
the search. This leads to a lot of extra work.

I don't know how the regex package implements the {m,n} construct,
but one way is to simply enumerate the various interim wild card
characters:
   a{5}b -> (a | aa | aaa | aaaa | aaaaa)b

> which combined with rules like 'foo.{,20}baz' 'bang.{,20}bar',
> etc. just excaberates the problem of flex not supporting multiple
> matches or overlapping matches.

See above it sounds like you might be looking for a parser with
backtracking capability.



-------------------------------------------------------
This SF.net email is sponsored by: IBM Linux Tutorials.
Become an expert in LINUX or just sharpen your skills.  Sign up for IBM's
Free Linux Tutorials.  Learn everything from the bash shell to sys admin.
Click now! http://ads.osdn.com/?ad_id=1278&alloc_id=3371&op=click
_______________________________________________
Spamassassin-talk mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/spamassassin-talk

Reply via email to