I'm sure this is a silly question, but it's one that I haven't been able
to find documentation on and has kept me from using the Bayes stuff in
SA.  Does it matter whether the header info is included in the e-mail
when submitted to sa-learn?  From what has been said in this thread, it
sounds like only the text of the message really matters.  The whole
point of the question is that I wasn't sure how to get my ham and spam
out of Outlook and into sa-learn.  Any thoughts?

J.

Jeffrey J Funk
President/CEO
Badger Internet, Inc.
[EMAIL PROTECTED]
608.661.4240

-----Original Message-----
From: Fox Flanders [mailto:[EMAIL PROTECTED] 
Sent: Friday, July 18, 2003 6:46 AM
To: Barry McLarnon; Simon Byrnand
Cc: [EMAIL PROTECTED]
Subject: Re: [SAtalk] Trouble training bayes ?

Yes.  I having been using Bayes since about the day Paul Graham
published
his algorithm.  I have always hand picked messages I knew where spam
(trollboxes) or ham (hand picked).  I found that filter, which I still
maintain, was so much more effective than SA autolearn, that I disabled
SA's
bayes filter.

I suspect that maybe autolearn would work well if you had a sizable,
modern
and accurate corpus to start with and then autolearned from there.

Fox



----- Original Message -----
From: "Simon Byrnand" <[EMAIL PROTECTED]>
To: "Barry McLarnon" <[EMAIL PROTECTED]>
Cc: <[EMAIL PROTECTED]>
Sent: Thursday, July 17, 2003 7:12 PM
Subject: Re: [SAtalk] Trouble training bayes ?


> At 14:16 17/07/03 -0400, Barry McLarnon wrote:
> >On Jul 16, 2003 09:34 pm, Simon Byrnand wrote:
> > > Anybody have any suggestions why almost all the ham I manually
> > > train won't budge below BAYES_30 ?
> >
> >I think you should suggest to your correspondents that they become
> >more literate. :-)  I just took a look at the ham in my inbox... of
> >160 messages, 104 had BAYES_01, 27 had BAYES_10, 19 had BAYES_20,
> >10 had BAYES_30, and none had higher.  Hard to say why your mileage
> >is varying so much, but maybe you can run Bayesian analysis on
> >individual ham messages and see which tokens are scoring relatively
> >high.
>
> My hunch is that auto-learning waters down the effectiveness of manual
> training. Our Bayes database is now up to nearly 60,000 spam and
60,000
> ham, and I suspect that the token numbers for common words are quite
large,
> therefore training on individual messages has a correspondingly small
> effect compared to if I only had say 2,000 spam and 2,000 ham.
>
> Anyone agree with this theory ?
>
> Regards,
> Simon
>
>
>
> -------------------------------------------------------
> This SF.net email is sponsored by: VM Ware
> With VMware you can run multiple operating systems on a single
machine.
> WITHOUT REBOOTING! Mix Linux / Windows / Novell virtual machines at
the
> same time. Free trial click here: http://www.vmware.com/wl/offer/345/0
> _______________________________________________
> Spamassassin-talk mailing list
> [EMAIL PROTECTED]
> https://lists.sourceforge.net/lists/listinfo/spamassassin-talk



-------------------------------------------------------
This SF.net email is sponsored by: VM Ware
With VMware you can run multiple operating systems on a single machine.
WITHOUT REBOOTING! Mix Linux / Windows / Novell virtual machines at the
same time. Free trial click here: http://www.vmware.com/wl/offer/345/0
_______________________________________________
Spamassassin-talk mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/spamassassin-talk


-------------------------------------------------------
This SF.net email is sponsored by: VM Ware
With VMware you can run multiple operating systems on a single machine.
WITHOUT REBOOTING! Mix Linux / Windows / Novell virtual machines at the
same time. Free trial click here: http://www.vmware.com/wl/offer/345/0
_______________________________________________
Spamassassin-talk mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/spamassassin-talk

Reply via email to