Jason Frisvold wrote:
Hi all,
I've been investigating some recent slowness issues with our mail
servers and I noticed that the spamassassin database is getting rather
large. We process approximately 300,000 mails a day (or more). The
bayes_token database is over 1.8 Gig at the moment. (Actually, 1.8 Gig
for the data, and 1.3 Gig for the index)
I checked the first few entries in the bayes_token database and the
atime on those entries was from back in Jan 2005. Is it safe to purge
this database and keep only "current" data? (I'm not sure what the
definition of current would be for a bayes database)
How about the other databases? Specifically, the awl database which
is approximately 350 Meg, or the bayes_seen database which is 150
Meg..
I think a lot of the slowdown I'm seeing right at this moment has to
do with spamassassin spending a good deal of time accessing the
database...
Sorry i accidently sent the previous (incomplete) message...
I'm by no means a bayes specialist but i dont think it's a good idea
just to delete the oldest entries since SA provides its own mean of
purging...
You might want to check your value for
bayes_expiry_max_db_size
and
sa-learn --dump magic
should give you the current number of tokens
If you reduce the value of the above directive and issue a
sa-learn --force-expire
it should expire all tokens not needed anymore until it reaches
(approximately) some value lower than the max_db_size ....
Hope that helps
Matt
ps: if you're using berkeleyDB - i've read lots of problems with big
databases with that -- consider switching to sql