Re: BAYES_00

RW Sat, 06 Oct 2012 19:17:29 -0700

On Sat, 06 Oct 2012 11:03:18 +0100
Arthur Dent wrote:

> Hello all,
> 
> Following a hard drive crash I am rebuilding my small home server on a
> Fedora17 platform.
> 
> One of the casualties of the HD crash was my spam corpus. I had a
> (very old) backup which happened to include a previous spam corpus so
> I used that to sa-learn.
> 
> All my messages hit BAYES_00. 
> 
> I don't have many "fresh" spams. I do not run a SMTP server, I simply
> collect mail for my family and myself from my ISP and other sources
> using fetchmail. My ISP seem to filter most of the really bad stuff
> so I get just a trickle of spams (about 1 per day - if that) but even
> those hit BAYES_00 despite sometimes being identical to a previous FN
> that had already been learned with sa-learn.
> 
> ...
> What - if anything - can I do to improve bayes performance?


I don't know if anyone got my previous reply to this, it just seemed to
disappear into gmail.

What I suggested is that you retrain from the corpora without
allowing any expiry because  the spammy tokens may be preferentially
discarded.

In general the expiry algorithm may not work well if you have fewer
then a few hams or a few spams a day because not enough tokens are
having their atimes updated by classification.

Re: BAYES_00

Reply via email to