On Thu, 2009-08-27 at 11:25 +0200, Benny Pedersen wrote: > On Thu 27 Aug 2009 02:34:16 AM CEST, Karsten Bräckelmann wrote > > Also, I do agree with the post by RW. By lowering the auto-learn ham > > threshold you managed to get the ratio more sane. However, continuing to > > do so you won't really learn any ham, but spam only. > > not if nham is bigger then nspam, this counters say if your thrshold > is good or bad imho, and it also show what to tweek to get more > learning in bayes
Benny, what the heck are you talking about? Have a look at the OP again, specifically the numbers and how they changed over time and conf changes. That's what *we* are talking about. And no, "nham > nspam" is irrelevant on its own. The important part is, how it developed. > if a spam mas scores 5.1 its unsafe to learn as spam, and if a ham > msgs scores 4.9 its unsafe to learn as ham This is as wrong as it can get, if you are talking about manual training. And with auto-learning, this won't happen anyway. -- char *t="\10pse\0r\0dtu...@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4"; main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;i<l;i++){ i%8? c<<=1: (c=*++x); c&128 && (s+=h); if (!(h>>=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}