I recently discovered that every 12 hours or so, Spamassassin will take longer 
to process a message. It does this to process it's Bayesian Classifier 
database. (Confirmed by Matt Kettler, SA developer.)

This fact might be worth noting in the Qmail scanner documentation as it uses 
"spamc -t 30 -f" by default which means QS will potentially kill SA before it 
finishes. (There is also a 20 minute time limit imposed by QS.) Perhaps the QS 
docs should encourage users to use manual Bayes expiry as this will avoid the 
problem.

I ran into this problem on a QS system with the symptom of QS taking a very 
long time to process every message, and then not giving any SA results. I 
think this was caused by SA's trying to clean-out it's Bayes database but 
never getting the chance.

-------- Original Message --------
Subject: Re: spamd throughput issues
Date: Mon, 10 Dec 2007 14:25:12 +0100 (CET)
From: Philipp Snizek <[EMAIL PROTECTED]>
To: Matt Kettler <[EMAIL PROTECTED]>
CC: [EMAIL PROTECTED]
>> > Philipp Snizek wrote: 
>>> >> You use Bayes?
>>> >> Have you tried turning off auto_expire? From my expierence this can
>>> >> cause
>>> >> significant performance issues.
>>> >>
>> > It shouldn't cause performance issues. It should only cause, at worst,
>> > one message every 12 hours or so to take a long time (ie: 10 minutes).
>> >
>> > Unless you've got something that times out and winds up killing it
>> > mid-expire.. in which case you'll end up retrying that expire
>> > indefinitely.
> 
> Quite a number of emails experienced a 10-minute timeout at spamc because
> of per-user bayes and timed out. They were forwarded unchecked to the
> user's inbox. Back then I had the impression that the more often the spamc
> children timed out the worse everything got.
> By manually expiring single user accounts I could reduce the timeouts even
> to zero withing 48 hours but it never could be considered as a solved
> issue.
> Running bayes expire manually on a per user basis was not a good idea.
> 
>> > Also, if you do turn it off, you *MUST* run sa-learn --force-expire on a
>> > regular basis, preferably via a cronjob. If you've got a per-user bayes
>> > config, you *MUST* run this as every user. Otherwise your bayes DB's
>> > will grow without bound.
> 
> That's what I do every 24 hours (environment: 1700 users).
> Also, I changed from per-user bayes to global bayes. Scanning performance
> increased, bayes db is cleaned up and never larger than 300k tokens
> (before on a bayes-per-user basis: 70 million tokens).
> Spamd has been running very nicely since the changes.
> 
> - Philipp











-- 
Rohan Carly

-------------------------------------------------------------------------
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/
_______________________________________________
Qmail-scanner-general mailing list
Qmail-scanner-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/qmail-scanner-general

Reply via email to