Bob Proulx wrote: > > > I am guessing that H*c is a header and some specific token. > If there a key somewhere that will help decode these? > >From Bayes.pm:
%HEADER_NAME_COMPRESSION = ( 'Message-Id' => '*m', 'Message-ID' => '*M', 'Received' => '*r', 'User-Agent' => '*u', 'References' => '*f', 'In-Reply-To' => '*i', 'From' => '*F', 'Reply-To' => '*R', 'Return-Path' => '*p', 'Return-path' => '*rp', 'X-Mailer' => '*x', 'X-Authentication-Warning' => '*a', 'Organization' => '*o', 'Organisation' => '*o', 'Content-Type' => '*c', 'X-Spam-Relays-Trusted' => '*RT', 'X-Spam-Relays-Untrusted' => '*RU', ); So H*r = Received: header, etc. > [15528] dbg: bayes: token 'H*MI:OEA0023' => 0.985096774193548 > [15528] dbg: bayes: token 'H*M:OEA0023' => 0.985096774193548 > [15528] dbg: bayes: token 'H*UA:Express' => 0.985060557114832 > [15528] dbg: bayes: token 'H*x:Express' => 0.985059973253254 > [15528] dbg: bayes: token 'HX-MimeOLE:V6.00.2900.2962' => 0.976898908840907 > [15528] dbg: bayes: token 'HX-MimeOLE:MimeOLE' => 0.976313886128059 > [15528] dbg: bayes: token 'HX-MSMail-Priority:Normal' => 0.974305670960733 > [15528] dbg: bayes: token 'HX-MimeOLE:Microsoft' => 0.959224439139177 > [15528] dbg: bayes: token 'HX-MimeOLE:Produced' => 0.959178732453666 > > It has really learned outlook as a spam source. But there should be > plenty of valid messages to have offset these. I keep running > sa-learn --ham on all valid messages hoping that it would offset the > spam ones. As you can see from the numbers there are 150,000 messages > and apparently all in the last 2.34 days too. (But that does not > quite make sense to me either.) > > [15528] dbg: bayes: token 'HX-Spam-Relays-Internal:lists.example.com' => > 0.950917490471412 > [15528] dbg: bayes: token 'HX-Spam-Relays-Internal:sk:monty-p' => > 0.95091594711816 > [15528] dbg: bayes: token 'HX-Spam-Relays-Internal:199.232.76.173' => > 0.95091594711816 > [15528] dbg: bayes: token 'HX-Spam-Relays-Internal:envfrom' => > 0.950880625609595 > [15528] dbg: bayes: token 'HX-Spam-Relays-Internal:auth' => > 0.950880625609595 > [15528] dbg: bayes: token 'HX-Spam-Relays-Internal:helo' => > 0.950880625609595 > [15528] dbg: bayes: token 'HX-Spam-Relays-Internal:intl' => > 0.950880625609595 > [15528] dbg: bayes: token 'HX-Spam-Relays-Internal:ident' => > 0.950880625609595 > [15528] dbg: bayes: token 'HX-Spam-Relays-Internal:rdns' => > 0.950880625609595 > > It seems to have learned one of the trusted_network machines as a spam > relay. Hmm... That seems like a bug. > Perhaps.. either that or you're doing your spam learning after this machine has added it's headers, but very little of your ham learning has it. > [8683] dbg: received-header: relay 199.232.76.173 trusted? yes internal? yes > > >> That should at least let you know what it is your bayes DB has learned >> that's bad. >> >> If it's not too horible you might be able to use sa-learn --backup to >> dump the DB, edit it by hand, and sa-learn --restore it. >> > > Hmm... That is an idea. A good suggestion. Of course everything has > been hashed so I would need to reverse engineer them back to something > meaningful but should be possible with a message to test against. > > I think the bayes is learning things from the mime structure that it > should not be learning such as multipart/alternative. Is there a way > to whitelist tokens so that it does not show up in the bayes at all? > > >> However, you'd need to find the correct SHA1 of the offending tokens.. >> not sure if that will be in the debug output. >> > > Yes. Correlating one to the other is going to be a pain. > > Thanks for the suggestions. > > Bob > >