This feels like a series of FAQs, but previous frequent answers don't
seem to answer my questions directly...

With Spamassassin 3.1.4 I'm running spamd. and my global procmail uses
spamc to process  mail.  Individual users train/report with spamc too. 
In an end-user account there's a .spamassasin directory and this contains:

auto-whitelist 
bayes_toks
user_prefs
bayes_journal
bayes_seen

All of which makes sense... Over time, however, there is a build-up of 
bayes_toks.expire$$$$ files (where $ is a decimal digit) and I'm unclear
about these.    Anecdotally, when there are lots of these
bayes_toks.expire$$$$ files, from time-to-time, emails stop being
processed by spamassassin and mail and spam are delivered to my inbox
without any spamassassin headers.  This happened most recently this
overnight and, subsequently, no messages were processed for spam.  I
re-started spamassassin and things seemed to work again... I ran
sa-learn --force-expire and it reported keeping ~17,000 tokens and
expiring ~6,000.  My bayes_toks.expire$$$$ files remained.  This left me
with lots of unanswered questions...

What causes the creation of a bayes_toks.expire$$$$ file?
Do bayes_toks.expire$$$$ files affect performance, or just consume disk
space?
What effect would deleting these files have on spamassassin Bayesian
processing?
Is it likely that the 'failure' of spamassassin arose as a consequence
of a growing number of entries in bayes_toks, or is it more likely a
fault triggered by processing a malicious mail?
I've seen vague references to time-out settings - is this likely a
configuration issue (if so, which configuration options should be my focus)?
The fact that my forced expiry kept < 75% of the tokens suggests to me
that expiry was not happening automatically... should it be?  How can I
tell if it is working?
Should I be regularly forcing expiry from a cron-job?


Reply via email to