On Tue, 2009-01-20 at 16:52 +0100, Matus UHLAR - fantomas wrote: > > On 20-Jan-2009, at 08:04, Karsten Bräckelmann wrote: > > >You should also train low scoring (tagged) spam. Or, even better, train > > >those identified spam with a "low" Bayes score. Similar for ham. > > On 20.01.09 08:19, LuKreme wrote: > > I thought tagged spam was automatically learned by bayes? > > > > Isn't that what bayes_auto_learn does? > > Only if you set it up so, and only if it fullfills some expectations, e.g. > some minimal score, some minimal score by header checks, some minimal score > by body checks...
Total score of at least 12, body and header 3 each -- by default. This is a safety measure. Also this is without Bayes and AWL. Plus some more esoteric constraints I forgot. Thus, yes, it makes perfect sense to manually learn low scoring spam. > manual training on any FPs/FNs that were not correctly autolearned from is a > good idea. Of course. Though those are the extremes only. Again, it also makes sense to learn *correctly* classified mail, if it isn't auto-learned. Even more so, if the Bayes value is close to 0.5 -- that's BAYES_50. Teaching your Bayes about what's ham and spam, especially in the gray area, will improve the results. guenther -- char *t="\10pse\0r\0dtu...@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4"; main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;i<l;i++){ i%8? c<<=1: (c=*++x); c&128 && (s+=h); if (!(h>>=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}