I recently discovered that every 12 hours or so, Spamassassin will take longer to process a message. It does this to process it's Bayesian Classifier database. (Confirmed by Matt Kettler, SA developer.)
This fact might be worth noting in the Qmail scanner documentation as it uses "spamc -t 30 -f" by default which means QS will potentially kill SA before it finishes. (There is also a 20 minute time limit imposed by QS.) Perhaps the QS docs should encourage users to use manual Bayes expiry as this will avoid the problem. I ran into this problem on a QS system with the symptom of QS taking a very long time to process every message, and then not giving any SA results. I think this was caused by SA's trying to clean-out it's Bayes database but never getting the chance. -------- Original Message -------- Subject: Re: spamd throughput issues Date: Mon, 10 Dec 2007 14:25:12 +0100 (CET) From: Philipp Snizek <[EMAIL PROTECTED]> To: Matt Kettler <[EMAIL PROTECTED]> CC: [EMAIL PROTECTED] >> > Philipp Snizek wrote: >>> >> You use Bayes? >>> >> Have you tried turning off auto_expire? From my expierence this can >>> >> cause >>> >> significant performance issues. >>> >> >> > It shouldn't cause performance issues. It should only cause, at worst, >> > one message every 12 hours or so to take a long time (ie: 10 minutes). >> > >> > Unless you've got something that times out and winds up killing it >> > mid-expire.. in which case you'll end up retrying that expire >> > indefinitely. > > Quite a number of emails experienced a 10-minute timeout at spamc because > of per-user bayes and timed out. They were forwarded unchecked to the > user's inbox. Back then I had the impression that the more often the spamc > children timed out the worse everything got. > By manually expiring single user accounts I could reduce the timeouts even > to zero withing 48 hours but it never could be considered as a solved > issue. > Running bayes expire manually on a per user basis was not a good idea. > >> > Also, if you do turn it off, you *MUST* run sa-learn --force-expire on a >> > regular basis, preferably via a cronjob. If you've got a per-user bayes >> > config, you *MUST* run this as every user. Otherwise your bayes DB's >> > will grow without bound. > > That's what I do every 24 hours (environment: 1700 users). > Also, I changed from per-user bayes to global bayes. Scanning performance > increased, bayes db is cleaned up and never larger than 300k tokens > (before on a bayes-per-user basis: 70 million tokens). > Spamd has been running very nicely since the changes. > > - Philipp -- Rohan Carly ------------------------------------------------------------------------- This SF.net email is sponsored by: Microsoft Defy all challenges. Microsoft(R) Visual Studio 2008. http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/ _______________________________________________ Qmail-scanner-general mailing list Qmail-scanner-general@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/qmail-scanner-general