On Fri, 2013-04-26 at 21:25 -0400, Joe Acquisto-j4 wrote: > Well, right now, there are no bayes hits at all. I cleared bayes to > re-train, after correcting for a botched initial scheme. > > While I am getting a fair amount of missed spam, there is very little > mis-classified. > > So I am looking for a way to speed up learning.
Initial training. Train on existing, verified corpora. > So, I could just feed a bunch of good mail, to --ham, and spam that is > correctly marked as spam as well as missed spam, to --spam? Yes. Bayes by default will not be used for scoring (it does learn, though), unless at least 200 spam and ham each have been learned. So by training, you can have Bayes kick in earlier. Ham usually does not change much over time. Spam does, significantly. Training 1000 ham received the last months, years, whatever, thus generally is OK. You'd want to limit the time span for training spam, though. And it is good advice to keep the initial training corpora to a ratio roughly assembling your ham/spam ratio, or maybe 1/1. (At this point, we're approaching woodoo. Learning 10 times more ham than spam is most likely to be a bad choice, though.) > or do I need a rest? Dunno. Got a beer near you? -- char *t="\10pse\0r\0dtu\0.@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4"; main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;i<l;i++){ i%8? c<<=1: (c=*++x); c&128 && (s+=h); if (!(h>>=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}