Re: SpamAssassins bayes mechanism and message headers

Matt Kettler Wed, 18 Mar 2009 16:50:48 -0700

Jeff Mincy wrote:
>    From: Matt Kettler <mkettler...@verizon.net>
>    Date: Tue, 17 Mar 2009 21:30:02 -0400
>    
>    fl...@pbartels.info wrote:
>    > Hello,
>    >
>    > instead of disabling a lot possibly set message headers using
>    > "bayes_ignore_header" and ending up in strange configs like:
>    >
>    > bayes_ignore_header Return-Path
>    ...
>    > (found on the net)
>    Where?
>    >
>    > shouldn't SpamAssassins bayes mechanism just ignore the complete
>    > message header and just look at the body?
>    > This seems useful in my opinion.
>    It seems like a very misguided idea to me.
>    
>    Is there any reason to think headers make bad tokens?
>    Do you have any test data showing this improves your bayes accuracy?
>
> Yes - I think some headers make extremely bad tokens for bayes, for
> example the X-Mailer/User-Agent headers.   40% of the spam I get
> claims to  have Microsoft Outlook as a x-Mailer.   So bayes rapidly
> determines that *UAMicrosoft (etc) is an extremely strong token.
> These *UA tokens were enough to push a short ham message to BAYES_99.
> When I added an bayes_ignore_header the score dropped to ~BAYES_40
>   
That seems rather extraordinarily strange. Did the messages match no
other tokens at all?  (ie: did you run it through spamaassassin -D bayes
before and after?)


I'd be very interested in what's going on there, because it makes very
little sense unless the message really matched very, very little other
existing training.

Re: SpamAssassins bayes mechanism and message headers

Reply via email to