Steve Dondley <s...@dondley.com> writes: > I've read through > https://spamassassin.apache.org/full/3.1.x/doc/sa-learn.html which > states that "anything over about 5000 messages does not improve > accuracy significantly in our tests."
I would take that with a grain of salt. Based on my experience running SA for many years, I'd say that if you have new spam that isn't like the spam you already have, learning on it will help. Also, I take it as a comment about "there's no need to try hard to get more the 5K messages". It doesn't say, "if you train on more than 5000 bad things will happen". > So once I hit 5,000, what do? Do I run --forget on say the 500 oldest > emails, delete those from my ham/spam folders and then add in a batch > of 500 newer ham/spam emails and then run sa-learn on all the emails > in my spam/ham folders? I've been running sa-learn daily over my ham folders and my spam folders for years. I refile spam and ham so that it will be learned. I find the bayes scoring is quite good except for novel spam. My bayes_* files are about 83M in total. So I don't think you necessarily have a problem to solve.
signature.asc
Description: PGP signature