On Tue, 2010-02-16 at 15:22 -0800, tonjg wrote: > I've got a feeling that the spamassassin on my machine is improving in the > way it recognises spam but I'd like to be sure it's not just my imagination. > I did my first manual bayes learn about 2 weeks ago using 200 spams and 200 > hams, the process appeared to go properly. I read that autolearn is enabled > by default and kicks in after 200 emails learnt, but is there a way to tell > whether bayes is actually learning?
In addition to what the other respondents to this thread have said (sa-learn --dump magic) you should also bear in mind the fact that autolearn only works within set parameters. These are configurable, but I forget what the default is for the moment. What this means is, that if the threshold for autolearning spam is set at 12, spam that is correctly identified as such and scores about 6 - 11 points in SA will not be autolearned. By the same token there is a maximum threshold for autolearning ham. I believe this is done for safety to prevent learning FPs and FNs inappropriately. What this means is that you must still continue to train bayes manually with those mails close to the threshold. I have a nightly cron job set up to read all my verified mail from spam and ham folders and learn as ham or spam respectively. It doesn't matter if the mail has already been learned - sa-learn will work that out for itself. See man sa-learn for the most comprehensive help you will ever find in a man page! HTH