On Mon, 2009-01-19 at 20:03 -0700, LuKreme wrote: > <http://home.kreme.com/ephspam.txt> > > Gotten a few like this in the last day or two. I have no idea why > they are hitting Bayes_99.
Err, you are seriously confusing me. :) It is intended. BAYES_99 means, the Bayesian probability is 99% or more, that this message is spam. Which it is. BAYES_50, a probability of 0.5, translates to "unsure". Anything below that is hammy. The lower the score the hammier, the higher the score the spammier. Reading your question I get the impression you don't understand the Bayes values. And much more severe, I can't help but read it as your spam does *not* usually hit BAYES_9x. If so, your Bayes DB is trained badly. > The account in question runs untagged spam through sa-learn pretty > frequently (at least once a week, sometimes more). You should also train low scoring (tagged) spam. Or, even better, train those identified spam with a "low" Bayes score. Similar for ham. > $ sa-learn --dump magic > 0.000 0 3 0 non-token data: bayes db version > 0.000 0 32600 0 non-token data: nspam > 0.000 0 83822 0 non-token data: nham > The nspam/nham numbers are ok, aren't they? I supposed I could run > more messages at nham? More *spam*, not ham. About equal values are best according to the docs, though 10 times more spam than ham typically doesn't do any harm. Given your comments and these numbers, your Bayes isn't trained sufficiently on spam. -- char *t="\10pse\0r\0dtu...@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4"; main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;i<l;i++){ i%8? c<<=1: (c=*++x); c&128 && (s+=h); if (!(h>>=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}