On 9 Mar 2021, at 7:49, Steve Dondley wrote:
I've read through https://spamassassin.apache.org/full/3.1.x/doc/sa-learn.html which states that "anything over about 5000 messages does not improve accuracy significantly in our tests."
Did you read the section on expiration? https://spamassassin.apache.org/full/3.1.x/doc/sa-learn.html#expiration
So once I hit 5,000, what do?
Be happy that you've reached near-optimal Bayes accuracy.
Do I run --forget on say the 500 oldest emails, delete those from my ham/spam folders and then add in a batch of 500 newer ham/spam emails and then run sa-learn on all the emails in my spam/ham folders?
There are edge cases where using --force-expire periodically is necessary to get expiration to run often enough to avoid bloat, but unless you have autolearn on and high volume you are unlikely to run into that problem. If you are only doing manual learning, all should be well.
-- Bill Cole b...@scconsult.com or billc...@apache.org (AKA @grumpybozo and many *@billmail.scconsult.com addresses) Not Currently Available For Hire