Re: Help with bayes

Karsten Bräckelmann Tue, 18 Nov 2008 13:02:17 -0800

On Tue, 2008-11-18 at 15:19 -0500, Troy Settle wrote:
> Kai Schaetzl wrote:
> > Troy Settle wrote on Mon, 17 Nov 2008 13:33:10 -0500:
> >
> >> I'm having a major problem with the bayes system.  I cleared the bayes 
> >> database and let it start re-learning.  Once it kicked in, I again 
> >> started getting false hits with BAYES_00=-2.599 on a great many spam/uce 
> >> messages.
> >
> > How did you "let it start re-learning"? What's the output of sa-learn dump 
> > magic?
>
> From incoming mail.  I'm still working on building a corpus suitable 
> for sa-learn.


You *need* to train on error.  Also, you definitely will want to
manually learn, at the very least until Bayes has been trained properly.

If you rely solely on auto-learning, there is a great many spams that
will not be learned. Which pretty much are exactly those where Bayes can
make a difference!

  
http://spamassassin.apache.org/full/3.2.x/doc/Mail_SpamAssassin_Conf.html#learning_options
  
http://spamassassin.apache.org/full/3.2.x/doc/Mail_SpamAssassin_Plugin_AutoLearnThreshold.html

By default, auto-learning will *not* learn any spam with a total score
less than 12 (without Bayes, etc) and body and header tests less than 3
respectively. It won't learn ham with a score above 0.1 either. This is
a safety measure.


> FWIW, how bad would I screw things up if I were to override the BAYES_00 
> score to 0?

That's not gonna solve your problems. You'd better properly train Bayes
on the stuff not auto-learned, so it will eventually learn the
difference between ham and spam. So far it only knows about the extreme
ends, which really don't need Bayes to make a difference anyway.

  guenther


-- 
char *t="[EMAIL PROTECTED]";
main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;i<l;i++){ i%8? c<<=1:
(c=*++x); c&128 && (s+=h); if (!(h>>=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}

Re: Help with bayes

Reply via email to