RE: corpus ham/spam balance

Karsten Bräckelmann Thu, 27 Aug 2009 06:30:01 -0700

On Thu, 2009-08-27 at 11:25 +0200, Benny Pedersen wrote:
> On Thu 27 Aug 2009 02:34:16 AM CEST, Karsten Bräckelmann wrote
> > Also, I do agree with the post by RW. By lowering the auto-learn ham
> > threshold you managed to get the ratio more sane. However, continuing to
> > do so you won't really learn any ham, but spam only.
> 
> not if nham is bigger then nspam, this counters say if your thrshold  
> is good or bad imho, and it also show what to tweek to get more  
> learning in bayes


Benny, what the heck are you talking about? Have a look at the OP again,
specifically the numbers and how they changed over time and conf
changes.

That's what *we* are talking about. And no, "nham > nspam" is irrelevant
on its own. The important part is, how it developed.


> if a spam mas scores 5.1 its unsafe to learn as spam, and if a ham  
> msgs scores 4.9 its unsafe to learn as ham

This is as wrong as it can get, if you are talking about manual
training. And with auto-learning, this won't happen anyway.


-- 
char *t="\10pse\0r\0dtu...@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4";
main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;i<l;i++){ i%8? c<<=1:
(c=*++x); c&128 && (s+=h); if (!(h>>=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}

RE: corpus ham/spam balance

Reply via email to