This feels like a series of FAQs, but previous frequent answers don't seem to answer my questions directly...
With Spamassassin 3.1.4 I'm running spamd. and my global procmail uses spamc to process mail. Individual users train/report with spamc too. In an end-user account there's a .spamassasin directory and this contains: auto-whitelist bayes_toks user_prefs bayes_journal bayes_seen All of which makes sense... Over time, however, there is a build-up of bayes_toks.expire$$$$ files (where $ is a decimal digit) and I'm unclear about these. Anecdotally, when there are lots of these bayes_toks.expire$$$$ files, from time-to-time, emails stop being processed by spamassassin and mail and spam are delivered to my inbox without any spamassassin headers. This happened most recently this overnight and, subsequently, no messages were processed for spam. I re-started spamassassin and things seemed to work again... I ran sa-learn --force-expire and it reported keeping ~17,000 tokens and expiring ~6,000. My bayes_toks.expire$$$$ files remained. This left me with lots of unanswered questions... What causes the creation of a bayes_toks.expire$$$$ file? Do bayes_toks.expire$$$$ files affect performance, or just consume disk space? What effect would deleting these files have on spamassassin Bayesian processing? Is it likely that the 'failure' of spamassassin arose as a consequence of a growing number of entries in bayes_toks, or is it more likely a fault triggered by processing a malicious mail? I've seen vague references to time-out settings - is this likely a configuration issue (if so, which configuration options should be my focus)? The fact that my forced expiry kept < 75% of the tokens suggests to me that expiry was not happening automatically... should it be? How can I tell if it is working? Should I be regularly forcing expiry from a cron-job?