On Sat, 2012-10-06 at 12:36 -0700, John Hardin wrote: > On Sat, 6 Oct 2012, Arthur Dent wrote: > > > Following a hard drive crash I am rebuilding my small home server on a > > Fedora17 platform. > > > > One of the casualties of the HD crash was my spam corpus. I had a (very > > old) backup which happened to include a previous spam corpus so I used > > that to sa-learn. > > > > All my messages hit BAYES_00. > > Well, you're probably going to have to re-train from scratch.
Awwww... > Review every message in your training corpora to ensure they are properly > classified. > > Add a bunch of new ham and, if you have any, new spam. Well I have a bash script that runs every night. It copies mail from all the folders I have in which I have ham into a temporary folder and then learns them as ham (and deletes the temporary folder). I have two other folders, one for spam caught by SA or manually put there by me, and another for "virus" infected emails caught by clamav (which, because I am using the Sanesecurity additional rules, are actually phishes, scams and good old spam). The script does a similar thing with these 2 folders and learns them as spam. The same emails will get learned over and over again - but I believe this is OK? > Very old spam (say, >5 years) may not be too useful, and probably should > be omitted, unless you have a very small spam corpus. The backup I used was from ... ahem... 2008 > Turn off autolearn. I'm in a similar situation and hand-training on the > rare misses works great for me. > > Also, given your low volume, I would recommend quarantining all spam, and > not having a discard threshold score over which spams are thrown out > unseen. Any that do get delivered can be reviewed and added to your > spam training corpus. > > Zap your Bayes database, re-train and see how it goes. I only have about 20 "fresh" spams in those two folders. Will bayes be deactivated until I get back to 200 spams? Thanks (yet) again... Mark
signature.asc
Description: This is a digitally signed message part