Re: BAYES question

Karsten Bräckelmann Fri, 26 Apr 2013 18:54:51 -0700

On Fri, 2013-04-26 at 21:25 -0400, Joe Acquisto-j4 wrote:
> Well, right now, there are no bayes hits at all.   I cleared bayes to
> re-train, after correcting for a botched initial scheme.
> 
> While I am getting a fair amount of missed spam, there is very little
> mis-classified.
> 
> So I am looking for a way to speed up learning.


Initial training. Train on existing, verified corpora.

> So, I could just feed a bunch of good mail, to --ham, and spam that is
> correctly marked as spam as well as missed spam, to --spam?

Yes. Bayes by default will not be used for scoring (it does learn,
though), unless at least 200 spam and ham each have been learned.

So by training, you can have Bayes kick in earlier.

Ham usually does not change much over time. Spam does, significantly.
Training 1000 ham received the last months, years, whatever, thus
generally is OK. You'd want to limit the time span for training spam,
though. And it is good advice to keep the initial training corpora to a
ratio roughly assembling your ham/spam ratio, or maybe 1/1. (At this
point, we're approaching woodoo. Learning 10 times more ham than spam is
most likely to be a bad choice, though.)


> or do I need a rest?

Dunno. Got a beer near you?


-- 
char *t="\10pse\0r\0dtu\0.@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4";
main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;i<l;i++){ i%8? c<<=1:
(c=*++x); c&128 && (s+=h); if (!(h>>=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}

Re: BAYES question

Reply via email to