On Tue, 2009-01-20 at 16:52 +0100, Matus UHLAR - fantomas wrote:
> > On 20-Jan-2009, at 08:04, Karsten Bräckelmann wrote:
> > >You should also train low scoring (tagged) spam. Or, even better, train
> > >those identified spam with a "low" Bayes score. Similar for ham.
> 
> On 20.01.09 08:19, LuKreme wrote:
> > I thought tagged spam was automatically learned by bayes?
> > 
> > Isn't that what bayes_auto_learn does?
> 
> Only if you set it up so, and only if it fullfills some expectations, e.g.
> some minimal score, some minimal score by header checks, some minimal score
> by body checks...

Total score of at least 12, body and header 3 each -- by default. This
is a safety measure. Also this is without Bayes and AWL. Plus some more
esoteric constraints I forgot.

Thus, yes, it makes perfect sense to manually learn low scoring spam.


> manual training on any FPs/FNs that were not correctly autolearned from is a
> good idea.

Of course. Though those are the extremes only. Again, it also makes
sense to learn *correctly* classified mail, if it isn't auto-learned.
Even more so, if the Bayes value is close to 0.5 -- that's BAYES_50.

Teaching your Bayes about what's ham and spam, especially in the gray
area, will improve the results.

  guenther

-- 
char *t="\10pse\0r\0dtu...@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4";
main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;i<l;i++){ i%8? c<<=1:
(c=*++x); c&128 && (s+=h); if (!(h>>=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}

Reply via email to