> > > One implementation might be to convert the rewrite rules into an > > equivalent flex description, and let flex generate the automaton in > > C. Compile the C, and build a Perl binding to it.
Scott replied: > I considered that and did a prototype (which was useful for > performance estimates and automata-size estimates) before deciding > that it won't work well. In particular, this can't handle overlapping > matches or cases where more than one rule matches. > OK. In your ealier example it looked as if your goal was to focus on rewrite rules which took a stream in and made simple transformations to produce a new (unobfuscated) output stream. But from above it seems that you would like to rewrite existing SA scoring patterns into some sort of automaton that will traverse the input stream in a more/less single pass, outputing a list of rules that were matched? Note that if you want to handle cases where more than one rule matches, you'll likely have to implement backtracking. Or in limited situations you can expand the alternatives at their point of invocation, but this might lead to a combinatorial explosion in the size of the automaton. > It also blows up with an idiom used all over SA 'foo.{,20}bar' This > can be worked around by two rules: > > foo { mark current position as a match for foo } > bar { check if current position is within 20 of marked position > for foo and record a rule match if so.} > A problem with that approach is the automaton might keep scanning all the way to the end of the input stream rather than bounding the search. This leads to a lot of extra work. I don't know how the regex package implements the {m,n} construct, but one way is to simply enumerate the various interim wild card characters: a{5}b -> (a | aa | aaa | aaaa | aaaaa)b > which combined with rules like 'foo.{,20}baz' 'bang.{,20}bar', > etc. just excaberates the problem of flex not supporting multiple > matches or overlapping matches. See above it sounds like you might be looking for a parser with backtracking capability. ------------------------------------------------------- This SF.net email is sponsored by: IBM Linux Tutorials. Become an expert in LINUX or just sharpen your skills. Sign up for IBM's Free Linux Tutorials. Learn everything from the bash shell to sys admin. Click now! http://ads.osdn.com/?ad_id=1278&alloc_id=3371&op=click _______________________________________________ Spamassassin-talk mailing list [EMAIL PROTECTED] https://lists.sourceforge.net/lists/listinfo/spamassassin-talk