On Mon, 2011-11-21 at 23:31 +0100, Jesper Wallin wrote:
> I also noticed that my old database only had 11k tokens while the new 
> one got about 60k (both the old and new server has hapaxes enabled and 
> was trained using a corpus of about 600 spam and 200 ham)

Is that "old" database the original one from the previous system, or old
as in "before learning from scratch", but *after* migrating the db?

I'd guess the latter. 11k tokens is terribly low, and as you just
noticed even less than learning a handful from scratch.

Are you sure the database conversion went cleanly?


-- 
char *t="\10pse\0r\0dtu\0.@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4";
main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;i<l;i++){ i%8? c<<=1:
(c=*++x); c&128 && (s+=h); if (!(h>>=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}

Reply via email to