Yes. I having been using Bayes since about the day Paul Graham published his algorithm. I have always hand picked messages I knew where spam (trollboxes) or ham (hand picked). I found that filter, which I still maintain, was so much more effective than SA autolearn, that I disabled SA's bayes filter.
I suspect that maybe autolearn would work well if you had a sizable, modern and accurate corpus to start with and then autolearned from there. Fox ----- Original Message ----- From: "Simon Byrnand" <[EMAIL PROTECTED]> To: "Barry McLarnon" <[EMAIL PROTECTED]> Cc: <[EMAIL PROTECTED]> Sent: Thursday, July 17, 2003 7:12 PM Subject: Re: [SAtalk] Trouble training bayes ? > At 14:16 17/07/03 -0400, Barry McLarnon wrote: > >On Jul 16, 2003 09:34 pm, Simon Byrnand wrote: > > > Anybody have any suggestions why almost all the ham I manually > > > train won't budge below BAYES_30 ? > > > >I think you should suggest to your correspondents that they become > >more literate. :-) I just took a look at the ham in my inbox... of > >160 messages, 104 had BAYES_01, 27 had BAYES_10, 19 had BAYES_20, > >10 had BAYES_30, and none had higher. Hard to say why your mileage > >is varying so much, but maybe you can run Bayesian analysis on > >individual ham messages and see which tokens are scoring relatively > >high. > > My hunch is that auto-learning waters down the effectiveness of manual > training. Our Bayes database is now up to nearly 60,000 spam and 60,000 > ham, and I suspect that the token numbers for common words are quite large, > therefore training on individual messages has a correspondingly small > effect compared to if I only had say 2,000 spam and 2,000 ham. > > Anyone agree with this theory ? > > Regards, > Simon > > > > ------------------------------------------------------- > This SF.net email is sponsored by: VM Ware > With VMware you can run multiple operating systems on a single machine. > WITHOUT REBOOTING! Mix Linux / Windows / Novell virtual machines at the > same time. Free trial click here: http://www.vmware.com/wl/offer/345/0 > _______________________________________________ > Spamassassin-talk mailing list > [EMAIL PROTECTED] > https://lists.sourceforge.net/lists/listinfo/spamassassin-talk ------------------------------------------------------- This SF.net email is sponsored by: VM Ware With VMware you can run multiple operating systems on a single machine. WITHOUT REBOOTING! Mix Linux / Windows / Novell virtual machines at the same time. Free trial click here: http://www.vmware.com/wl/offer/345/0 _______________________________________________ Spamassassin-talk mailing list [EMAIL PROTECTED] https://lists.sourceforge.net/lists/listinfo/spamassassin-talk