On Sat, 2012-10-06 at 12:36 -0700, John Hardin wrote:
> On Sat, 6 Oct 2012, Arthur Dent wrote:
> 
> > Following a hard drive crash I am rebuilding my small home server on a
> > Fedora17 platform.
> >
> > One of the casualties of the HD crash was my spam corpus. I had a (very
> > old) backup which happened to include a previous spam corpus so I used
> > that to sa-learn.
> >
> > All my messages hit BAYES_00.
> 
> Well, you're probably going to have to re-train from scratch.

Awwww... 

> Review every message in your training corpora to ensure they are properly 
> classified.
> 
> Add a bunch of new ham and, if you have any, new spam.

Well I have a bash script that runs every night. It copies mail from all
the folders I have in which I have ham into a temporary folder and then
learns them as ham (and deletes the temporary folder).

I have two other folders, one for spam caught by SA or manually put
there by me, and another for "virus" infected emails caught by clamav
(which, because I am using the Sanesecurity additional rules, are
actually phishes, scams and good old spam). The script does a similar
thing with these 2 folders and learns them as spam.

The same emails will get learned over and over again - but I believe
this is OK? 

> Very old spam (say, >5 years) may not be too useful, and probably should 
> be omitted, unless you have a very small spam corpus.

The backup I used was from ... ahem... 2008

> Turn off autolearn. I'm in a similar situation and hand-training on the 
> rare misses works great for me.
> 
> Also, given your low volume, I would recommend quarantining all spam, and 
> not having a discard threshold score over which spams are thrown out 
> unseen. Any that do get delivered can be reviewed and added to your 
> spam training corpus.
> 
> Zap your Bayes database, re-train and see how it goes.

I only have about 20 "fresh" spams in those two folders. Will bayes be
deactivated until I get back to 200 spams?

Thanks (yet) again...

Mark
 

Attachment: signature.asc
Description: This is a digitally signed message part

Reply via email to