Yes.  I having been using Bayes since about the day Paul Graham published
his algorithm.  I have always hand picked messages I knew where spam
(trollboxes) or ham (hand picked).  I found that filter, which I still
maintain, was so much more effective than SA autolearn, that I disabled SA's
bayes filter.

I suspect that maybe autolearn would work well if you had a sizable, modern
and accurate corpus to start with and then autolearned from there.

Fox



----- Original Message -----
From: "Simon Byrnand" <[EMAIL PROTECTED]>
To: "Barry McLarnon" <[EMAIL PROTECTED]>
Cc: <[EMAIL PROTECTED]>
Sent: Thursday, July 17, 2003 7:12 PM
Subject: Re: [SAtalk] Trouble training bayes ?


> At 14:16 17/07/03 -0400, Barry McLarnon wrote:
> >On Jul 16, 2003 09:34 pm, Simon Byrnand wrote:
> > > Anybody have any suggestions why almost all the ham I manually
> > > train won't budge below BAYES_30 ?
> >
> >I think you should suggest to your correspondents that they become
> >more literate. :-)  I just took a look at the ham in my inbox... of
> >160 messages, 104 had BAYES_01, 27 had BAYES_10, 19 had BAYES_20,
> >10 had BAYES_30, and none had higher.  Hard to say why your mileage
> >is varying so much, but maybe you can run Bayesian analysis on
> >individual ham messages and see which tokens are scoring relatively
> >high.
>
> My hunch is that auto-learning waters down the effectiveness of manual
> training. Our Bayes database is now up to nearly 60,000 spam and 60,000
> ham, and I suspect that the token numbers for common words are quite
large,
> therefore training on individual messages has a correspondingly small
> effect compared to if I only had say 2,000 spam and 2,000 ham.
>
> Anyone agree with this theory ?
>
> Regards,
> Simon
>
>
>
> -------------------------------------------------------
> This SF.net email is sponsored by: VM Ware
> With VMware you can run multiple operating systems on a single machine.
> WITHOUT REBOOTING! Mix Linux / Windows / Novell virtual machines at the
> same time. Free trial click here: http://www.vmware.com/wl/offer/345/0
> _______________________________________________
> Spamassassin-talk mailing list
> [EMAIL PROTECTED]
> https://lists.sourceforge.net/lists/listinfo/spamassassin-talk



-------------------------------------------------------
This SF.net email is sponsored by: VM Ware
With VMware you can run multiple operating systems on a single machine.
WITHOUT REBOOTING! Mix Linux / Windows / Novell virtual machines at the
same time. Free trial click here: http://www.vmware.com/wl/offer/345/0
_______________________________________________
Spamassassin-talk mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/spamassassin-talk

Reply via email to