On Sat, 06 Oct 2012 11:03:18 +0100 Arthur Dent wrote: > Hello all, > > Following a hard drive crash I am rebuilding my small home server on a > Fedora17 platform. > > One of the casualties of the HD crash was my spam corpus. I had a > (very old) backup which happened to include a previous spam corpus so > I used that to sa-learn. > > All my messages hit BAYES_00. > > I don't have many "fresh" spams. I do not run a SMTP server, I simply > collect mail for my family and myself from my ISP and other sources > using fetchmail. My ISP seem to filter most of the really bad stuff > so I get just a trickle of spams (about 1 per day - if that) but even > those hit BAYES_00 despite sometimes being identical to a previous FN > that had already been learned with sa-learn. > > ... > What - if anything - can I do to improve bayes performance?
I don't know if anyone got my previous reply to this, it just seemed to disappear into gmail. What I suggested is that you retrain from the corpora without allowing any expiry because the spammy tokens may be preferentially discarded. In general the expiry algorithm may not work well if you have fewer then a few hams or a few spams a day because not enough tokens are having their atimes updated by classification.