On Sat, 11 May 2002, Mail Admin wrote: > Hi, I want to use spamassassin on a system where real heavy load > exists. I have 540,000 incoming emails daily. I know spamc/spamd do > well under moderate load , but this is not enough. Did anybody think > of rewriting spammassasin in C,
Yup. It's been suggested here before and, in fact, someone said that they have done so. > and may be use a high performance threading library like pth Nope. This was very clever of them, too, because they didn't pay the stupidly high costs that introducing threading would have. Instead they used a sane, tested and reasonable fork() based implementation which gives all the benefits and none of the costs for this workload. > for a native daemon like spamd and considering optimisation in rules > matching? That's being done and, frankly, there are two things that are likely to be hotspots in SpamAssassin: * the Perl regexp engine running the rules. * the need to walk a message larger than L1 cache more than once. * forking, forking, forking. For the first, *nothing* that you do is likely to improve things much other than rewriting the rules themselves; this can be done equally well with Perl. For the second, rewriting SpamAssassin to use a streaming, single pass algorithm would be (ahem) challenging. I don't suggest it. :) > If nothing is done in that. I am ready to help in such project. If > anybody is interested please mail me , and lets start. Check the mailing list archives. > If I am completely out of point and what I am thinking of will not > help in getting better performance. Please tell me :) It's not likely to because you have not clearly identified where the performance problems lie. Every time you fork a process, you pay a huge cost. Avoiding that would improve your throughput dramatically. Using spam[cd] you pay at least *three* forks per message, at best, and probably closer to five. Fix that first, if you want to fix anything. Grab, or write, a version of spamproxyd that you trust[1] with your email, then have inbound SMTP talk directly to that and have it relay on to the real MTA. That should give you one or, with extra work, less than one fork per message. That is the best way to improve your performance. Oh, and consider SMP hardware in this; SA testing is something that should give you as close to linear performance improvement as you will ever see in the real world -- for exactly the same reason that using threads is a bad idea. :) Daniel Footnotes: [1] Even with the latest batch of patches[2] it looks to me like the existing spamproxyd can still lose email in a crash situation. [2] Which I only glanced at, I admit. -- 20+ years as a vegetarian and the guy who steals my credit card orders $6,000 worth of chicken parts: proof that the most powerful force in the universe is Irony. -- David Weinberger, _JOHO_ (2000-03-20) _______________________________________________________________ Have big pipes? SourceForge.net is looking for download mirrors. We supply the hardware. You get the recognition. Email Us: [EMAIL PROTECTED] _______________________________________________ Spamassassin-talk mailing list [EMAIL PROTECTED] https://lists.sourceforge.net/lists/listinfo/spamassassin-talk