On Wed, 26 Nov 2003 09:31:36 +0600, Alexander Litvinov <[EMAIL PROTECTED]> writes:

> Heh, it seems it would be nice to make SA scan messages fatser. If I
> undersand your idea correctly, you want not to run regexp one by
> one, but write the state machine for all regepes and walk on this
> states by the mail, but... I undersand how this may be faster (liner
> time of the message size) if SA had one rexep for detecting
> spam. Now SA have muliplt rules that can be fired
> simultaneously. For this situation I can't imaging the way to write
> the state machine.

Its relatively straightforward, and has been known for about 30
years. See the Dragon book, chapter 3. Add a few enhancements and you
get an engine with the same semantics that SA uses, recognizing
overlapping and concurrent patterns at all locations in the input.

If you want a taste of something that can almost do this, see GNU
flex. It is a lexer generator where you pass it a list of regexps and
snippets of C code and it matches them in parallel.

Worst case for such an engine is quadratic time, for any pattern.
Extrapolating from a somewhat related prototype that matched 5,000
patterns at all offsets in an input string, the typical case of such
an automata compiler should be about 2-10x slower than a single linear
scan, or 5-20mbyte/sec.

> Another issue: it seems to me most time consuming operation -
> network tests.  

Network tests don't consume CPU, but they do increase latency. Because
they don't consume CPU, and if you have enough RAM, to parallelize and
while one instance waits for a response another instance is processing
a message. Most of the RAM problem can be solved with clever
queuing. Analyze an email for RBL tests, then queue it in memory until
the maximum, say 100MB, of RAM is used or until all RBL tests
reply. Then fully finish processing the message.

Scott


-------------------------------------------------------
This SF.net email is sponsored by: SF.net Giveback Program.
Does SourceForge.net help you be more productive?  Does it
help you create better code?  SHARE THE LOVE, and help us help
YOU!  Click Here: http://sourceforge.net/donate/
_______________________________________________
Spamassassin-talk mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/spamassassin-talk

Reply via email to