I blame SpamAssassin for these Bayes bypassing tricks. I had a custom Bayes solution working many months before it appeared in SpamAssassin. There was none of this bypassing crudola happening until SpamAssassin popularized Bayes :)
Now I get messages with a spam text/html mime part and the Declaration of Independence for a text/plain mime part to fool the Bayes filter. It is very effective at maiming my Bayes filter. Of course I could start dropping text/plain mime parts when there is a text/html part present. Fox ----- Original Message ----- From: "Daniel Quinlan" <[EMAIL PROTECTED]> To: "Mike Batchelor" <[EMAIL PROTECTED]> Cc: <[EMAIL PROTECTED]> Sent: Thursday, June 26, 2003 3:06 AM Subject: Re: [SAtalk] Spammers sneaking lower Bayes scores > Mike Batchelor <[EMAIL PROTECTED]> writes: > > > Note the random words within the <font> tags at the end of the spam. I > > think they lowered its Bayes score, which dropped it below my threshold > > overall. That, and the lack of any other text aside from the links... > > Yes, the random words in that email work against Bayes (I think this > particular exploit has been well-known for a while, and spammers are > definitely getting better at it). It's why I've never believed that > Bayes is a panacea. This goes back to the SA philosophy: we don't rely > on any one technique since no one technique is prefect. SpamAssassin > only uses Bayes as a small subset of the rules. > > And while random words fool simple checksum systems like Razor1, they > don't fool Razor2, DCC, or Pyzor. They also don't fool RBLs. This spam > was lucky enough not to be listed anywhere yet, but most won't be. > > > Is this tactic likely to succeed for them, rendering our Bayesian > > classifiers ineffective? What do you think? > > Well, it can work. (It does take a fairly smart spammer to pull it > off.) > > Here's the list of tokens matched (sorted by probability). It does look > like they've managed to construct a message with a very low score, all > around. > > debug: bayes token 'bg.jpg' => 0.997298245614035 > debug: bayes token 'take-me-off' => 0.994923076923077 > debug: bayes token 'dairy' => 0.988731707317073 > debug: bayes token 'URI' => 0.965041112956667 > debug: bayes token 'studs' => 0.958 > debug: bayes token 'N:HX-Mail-Format-Warning:RFCNNNN' => 0.958 > debug: bayes token 'HX-Mail-Format-Warning:header' => 0.958 > debug: bayes token 'HX-Mail-Format-Warning:formatting' => 0.958 > debug: bayes token 'HX-Mail-Format-Warning:RFC2822' => 0.958 > debug: bayes token 'HX-Mail-Format-Warning:Bad' => 0.958 > debug: bayes token 'amazing' => 0.942726598001046 > debug: bayes token 'images' => 0.926845523863797 > debug: bayes token 'H*r:501' => 0.925317612750241 > debug: bayes token 'index.html' => 0.903306785729035 > debug: bayes token 'H*c:HHHH' => 0.882993935307079 > debug: bayes token 'N:H*M:NNNNNNNNNNNNNN' => 0.151858612440554 > debug: bayes token 'N:H*r:NNN' => 0.146710592234121 > debug: bayes token 'N:H*r:N.NN.N' => 0.143944051542205 > debug: bayes token 'H*r:8.12.2' => 0.131883948595905 > debug: bayes token 'N:H*M:NNNNN' => 0.107562836072879 > debug: bayes token 'N:HX-Sieve:N.N' => 0.0489090909090909 > debug: bayes token 'HX-Sieve:cmu-sieve' => 0.0489090909090909 > debug: bayes token 'verbally' => 0.0256190476190476 > debug: bayes token 'gels' => 0.0256190476190476 > debug: bayes token 'hairiness' => 0.0173548387096774 > debug: bayes token 'modulating' => 0.0131219512195122 > debug: bayes: score = 0.564805986629628 > > (Any Bayesian classifier can produce this type of list once you build a > corpus. And there's no magic to Bayes that prevents spammers from doing > this too to figure out how many words need to be counterbalanced, etc.) > > Note that my Bayes database picked up on some tokens that were actually > added by your personal software (like X-Sieve and some of the other > tokens, like the Message-ID). If I remove those, my Bayes probability > would go up. I didn't get this particular spam, so I don't know if > Bayes would have worked for me or not. Probably not enough to catch it > as spam, though. > > Some enhancements to Bayes might be in order. > > Daniel > > -- > Daniel Quinlan anti-spam (SpamAssassin), Linux, and open > http://www.pathname.com/~quinlan/ source consulting (looking for new work) > > > ------------------------------------------------------- > This SF.Net email is sponsored by: INetU > Attention Web Developers & Consultants: Become An INetU Hosting Partner. > Refer Dedicated Servers. We Manage Them. You Get 10% Monthly Commission! > INetU Dedicated Managed Hosting http://www.inetu.net/partner/index.php > _______________________________________________ > Spamassassin-talk mailing list > [EMAIL PROTECTED] > https://lists.sourceforge.net/lists/listinfo/spamassassin-talk ------------------------------------------------------- This SF.Net email is sponsored by: INetU Attention Web Developers & Consultants: Become An INetU Hosting Partner. Refer Dedicated Servers. We Manage Them. You Get 10% Monthly Commission! INetU Dedicated Managed Hosting http://www.inetu.net/partner/index.php _______________________________________________ Spamassassin-talk mailing list [EMAIL PROTECTED] https://lists.sourceforge.net/lists/listinfo/spamassassin-talk