From: Matt Kettler <mkettler...@verizon.net> Date: Wed, 18 Mar 2009 19:49:53 -0400 Jeff Mincy wrote: > From: Matt Kettler <mkettler...@verizon.net> > Date: Tue, 17 Mar 2009 21:30:02 -0400 > > fl...@pbartels.info wrote: > > Hello, > > > > instead of disabling a lot possibly set message headers using > > "bayes_ignore_header" and ending up in strange configs like: > > > > bayes_ignore_header Return-Path > ... > > (found on the net) > Where? > > > > shouldn't SpamAssassins bayes mechanism just ignore the complete > > message header and just look at the body? > > This seems useful in my opinion. > It seems like a very misguided idea to me. > > Is there any reason to think headers make bad tokens? > Do you have any test data showing this improves your bayes accuracy? > > Yes - I think some headers make extremely bad tokens for bayes, for > example the X-Mailer/User-Agent headers. 40% of the spam I get > claims to have Microsoft Outlook as a x-Mailer. So bayes rapidly > determines that *UAMicrosoft (etc) is an extremely strong token. > These *UA tokens were enough to push a short ham message to BAYES_99. > When I added an bayes_ignore_header the score dropped to ~BAYES_40 > That seems rather extraordinarily strange. Did the messages match no other tokens at all? (ie: did you run it through spamaassassin -D bayes before and after?) This was the X-Spam-Bayes header that was added at the time: X-Spam-Bayes: bayes=1.0000, N=27(19-0+13), ham=(), spam=(HTo:U*mincy, HTo:D*com, HTo:D*rcn.com, H*F:D*net, H*UA:Build)
This header was added using: add_header all Bayes bayes=_BAYES_, N=_BAYESTC_(_BAYESTCLEARNED_-_BAYESTCHAMMY_+_BAYESTCSPAMMY_), ham=(_HAMMYTOKENS(5,short)_), spam=(_SPAMMYTOKENS(5,short)_) So, there are 27 tokens, 0 hammy, 13 spammy. I'd be very interested in what's going on there, because it makes very little sense unless the message really matched very, very little other existing training. 3 of the top 5 spammy tokens eg: HTo:U*mincy, HTo:D*com, HTo:D*rcn.com come from the To: mi...@rcn.com header. The H*UA:Build came from a 'X-Mailer: Microsoft Outlook IMO, Build 9.0.2416 (9.0.2911.0)' header. As I recall, there were various H*UA:Outlook etc headers. Bayes was 100.000% sure that this message was spam based on the To, X-Mailer, and From headers. The envelope on all email message that I read at home are addressed to mi...@rcn.com (ignoring for the moment that mi...@starpower.net also happens to get to me). The 'To:' header is either going to be mi...@rcn.com or some made up email address that will never be repeated or it is my email address. So Bayes will see my email address in both spam and ham. At the time more than 80% of email I was getting at rcn.com was spam so, To: mi...@rcn.com was turned into three strong spam tokens. My real mi...@rcn.com email address in the To header says nothing about the spamminess of the message. This is in contrast to the mi...@starpower.net email address which is almost certainly spam and has been added to the blacklist_to). So my solution was to add 'bayes_ignore_header To From' and use blacklist_to/blacklist_from for the suspect email addresses. I came up with similar justification for adding 'bayes_ignore_header X-Mailer'. The body of the message was a single sentence asking me about my primary music software. If you want to see more detail lets take it off the public mailing list. -jeff