On Sat, 11 May 2002, Mail Admin wrote:
> Hi, I want to use spamassassin on a system where real heavy load
> exists. I have 540,000 incoming emails daily. I know spamc/spamd do
> well under moderate load , but this is not enough. Did anybody think
> of rewriting spammassasin in C,

Yup. It's been suggested here before and, in fact, someone said that
they have done so.

> and may be use a high performance threading library like pth 

Nope. This was very clever of them, too, because they didn't pay the
stupidly high costs that introducing threading would have. Instead they
used a sane, tested and reasonable fork() based implementation which
gives all the benefits and none of the costs for this workload.

> for a native daemon like spamd and considering optimisation in rules
> matching? 

That's being done and, frankly, there are two things that are likely to
be hotspots in SpamAssassin:

* the Perl regexp engine running the rules.
* the need to walk a message larger than L1 cache more than once.
* forking, forking, forking.

For the first, *nothing* that you do is likely to improve things much
other than rewriting the rules themselves; this can be done equally well
with Perl.

For the second, rewriting SpamAssassin to use a streaming, single pass
algorithm would be (ahem) challenging. I don't suggest it. :)

> If nothing is done in that. I am ready to help in such project. If
> anybody is interested please mail me , and lets start. 

Check the mailing list archives.

> If I am completely out of point and what I am thinking of will not
> help in getting better performance. Please tell me :)

It's not likely to because you have not clearly identified where the
performance problems lie.

Every time you fork a process, you pay a huge cost. Avoiding that would
improve your throughput dramatically. Using spam[cd] you pay at least
*three* forks per message, at best, and probably closer to five.

Fix that first, if you want to fix anything. Grab, or write, a version
of spamproxyd that you trust[1] with your email, then have inbound SMTP
talk directly to that and have it relay on to the real MTA.

That should give you one or, with extra work, less than one fork per
message. That is the best way to improve your performance.


Oh, and consider SMP hardware in this; SA testing is something that
should give you as close to linear performance improvement as you will
ever see in the real world -- for exactly the same reason that using
threads is a bad idea. :)

        Daniel

Footnotes: 
[1]  Even with the latest batch of patches[2] it looks to me like the
     existing spamproxyd can still lose email in a crash situation.

[2]  Which I only glanced at, I admit.

-- 
20+ years as a vegetarian and the guy who steals my credit card
orders $6,000 worth of chicken parts: proof that the most powerful
force in the universe is Irony.
        -- David Weinberger, _JOHO_ (2000-03-20)

_______________________________________________________________

Have big pipes? SourceForge.net is looking for download mirrors. We supply
the hardware. You get the recognition. Email Us: [EMAIL PROTECTED]
_______________________________________________
Spamassassin-talk mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/spamassassin-talk

Reply via email to