On Jul 16, 2003 09:34 pm, Simon Byrnand wrote: > Anybody have any suggestions why almost all the ham I manually > train won't budge below BAYES_30 ?
I think you should suggest to your correspondents that they become more literate. :-) I just took a look at the ham in my inbox... of 160 messages, 104 had BAYES_01, 27 had BAYES_10, 19 had BAYES_20, 10 had BAYES_30, and none had higher. Hard to say why your mileage is varying so much, but maybe you can run Bayesian analysis on individual ham messages and see which tokens are scoring relatively high.
My hunch is that auto-learning waters down the effectiveness of manual training. Our Bayes database is now up to nearly 60,000 spam and 60,000 ham, and I suspect that the token numbers for common words are quite large, therefore training on individual messages has a correspondingly small effect compared to if I only had say 2,000 spam and 2,000 ham.
Anyone agree with this theory ?
Regards, Simon
------------------------------------------------------- This SF.net email is sponsored by: VM Ware With VMware you can run multiple operating systems on a single machine. WITHOUT REBOOTING! Mix Linux / Windows / Novell virtual machines at the same time. Free trial click here: http://www.vmware.com/wl/offer/345/0 _______________________________________________ Spamassassin-talk mailing list [EMAIL PROTECTED] https://lists.sourceforge.net/lists/listinfo/spamassassin-talk