RE: [SAtalk] Bayes mis-learning problem

Larry Gilson Mon, 19 Jan 2004 13:45:42 -0800

Look at:

http://useast.spamassassin.org/doc/Mail_SpamAssassin_Conf.html#learning%20op
tions


bayes_ignore_header header_name

If you receive mail filtered by upstream mail systems, like a spam-filtering
ISP or mailing list, and that service adds new headers (as most of them do),
these headers may provide inappropriate cues to the Bayesian classifier,
allowing it to take a ``short cut''. To avoid this, list the headers using
this setting. Example: 
        bayes_ignore_header X-Upstream-Spamfilter
        bayes_ignore_header X-Upstream-SomethingElse


An example:
http://www.stearns.org/doc/spamassassin-setup.current.html#autoreporting



--Larry



> -----Original Message-----
> From: [EMAIL PROTECTED]
[mailto:spamassassin-talk-
> [EMAIL PROTECTED] On Behalf Of Ross Vandegrift
> Sent: Monday, January 19, 2004 2:53 PM
> To: [EMAIL PROTECTED]
> Subject: [SAtalk] Bayes mis-learning problem
> 
> Hey everyone,
> 
>       We're currently coping with a false-positive crisis that's
> sweeping our email with 2.60, mostly due to scores of the Bayes filter.
> We run SA site-wide on an incoming MX host, so individual users do not
> have access to train the Bayes database.  Moreover, our primary client
> program is Pegasus Mail for DOS, which provides no real way to get raw
> messages out unmodified (it hoses CR/LF, forces line wraps, and cat's
> MIME parts together).
> 
>       So I'm going through some of our Bayes tokens trying to decide
> if I should dump the current database and start over.  I've noticed
> really bad things like this:
> 
> 0.892 381     112     1069183901      HTo:[EMAIL PROTECTED]
> 0.905 75      19      1069183901      HTo:[EMAIL PROTECTED]
> 0.997 17      0       1069183901      HTo:[EMAIL PROTECTED]
> 
> This looks really horrible!  Just by virtue of my boss's email having a
> "To: [EMAIL PROTECTED]", it'll almost certainly be tagged as spam.  The
> database is trained with nham=13685 and nspam=5652.  Autolearning is
> enabled and has default threshholds.
> 
> This is alarming at first.  But when I think about it, and I realize
> that most of us get more spam than ham - Bayes is right.  Unfortuantely,
> that's really, really the wrong thing to do.  Is there a way to excempt
> some headers from processing?



-------------------------------------------------------
The SF.Net email is sponsored by EclipseCon 2004
Premiere Conference on Open Tools Development and Integration
See the breadth of Eclipse activity. February 3-5 in Anaheim, CA.
http://www.eclipsecon.org/osdn
_______________________________________________
Spamassassin-talk mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/spamassassin-talk

RE: [SAtalk] Bayes mis-learning problem

Reply via email to