On Tue, 2008-11-18 at 15:19 -0500, Troy Settle wrote: > Kai Schaetzl wrote: > > Troy Settle wrote on Mon, 17 Nov 2008 13:33:10 -0500: > > > >> I'm having a major problem with the bayes system. I cleared the bayes > >> database and let it start re-learning. Once it kicked in, I again > >> started getting false hits with BAYES_00=-2.599 on a great many spam/uce > >> messages. > > > > How did you "let it start re-learning"? What's the output of sa-learn dump > > magic? > > From incoming mail. I'm still working on building a corpus suitable > for sa-learn.
You *need* to train on error. Also, you definitely will want to manually learn, at the very least until Bayes has been trained properly. If you rely solely on auto-learning, there is a great many spams that will not be learned. Which pretty much are exactly those where Bayes can make a difference! http://spamassassin.apache.org/full/3.2.x/doc/Mail_SpamAssassin_Conf.html#learning_options http://spamassassin.apache.org/full/3.2.x/doc/Mail_SpamAssassin_Plugin_AutoLearnThreshold.html By default, auto-learning will *not* learn any spam with a total score less than 12 (without Bayes, etc) and body and header tests less than 3 respectively. It won't learn ham with a score above 0.1 either. This is a safety measure. > FWIW, how bad would I screw things up if I were to override the BAYES_00 > score to 0? That's not gonna solve your problems. You'd better properly train Bayes on the stuff not auto-learned, so it will eventually learn the difference between ham and spam. So far it only knows about the extreme ends, which really don't need Bayes to make a difference anyway. guenther -- char *t="[EMAIL PROTECTED]"; main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;i<l;i++){ i%8? c<<=1: (c=*++x); c&128 && (s+=h); if (!(h>>=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}