Re: Spam hitting Bayes_99?

Karsten Bräckelmann Tue, 20 Jan 2009 07:04:46 -0800

On Mon, 2009-01-19 at 20:03 -0700, LuKreme wrote:
> <http://home.kreme.com/ephspam.txt>
> 
> Gotten a few like this in the last day or two.  I have no idea why  
> they are hitting Bayes_99.


Err, you are seriously confusing me. :)

It is intended. BAYES_99 means, the Bayesian probability is 99% or more,
that this message is spam. Which it is.

BAYES_50, a probability of 0.5, translates to "unsure". Anything below
that is hammy. The lower the score the hammier, the higher the score the
spammier.

Reading your question I get the impression you don't understand the
Bayes values. And much more severe, I can't help but read it as your
spam does *not* usually hit BAYES_9x. If so, your Bayes DB is trained
badly.

> The account in question runs untagged spam through sa-learn pretty  
> frequently (at least once a week, sometimes more).

You should also train low scoring (tagged) spam. Or, even better, train
those identified spam with a "low" Bayes score. Similar for ham.

> $ sa-learn --dump magic
> 0.000          0          3          0  non-token data: bayes db version
> 0.000          0      32600          0  non-token data: nspam
> 0.000          0      83822          0  non-token data: nham

> The nspam/nham numbers are ok, aren't they?  I supposed I could run  
> more messages at nham?

More *spam*, not ham. About equal values are best according to the docs,
though 10 times more spam than ham typically doesn't do any harm. Given
your comments and these numbers, your Bayes isn't trained sufficiently
on spam.


-- 
char *t="\10pse\0r\0dtu...@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4";
main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;i<l;i++){ i%8? c<<=1:
(c=*++x); c&128 && (s+=h); if (!(h>>=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}

Re: Spam hitting Bayes_99?

Reply via email to