I blame SpamAssassin for these Bayes bypassing tricks.  I had a custom Bayes
solution working many months before it appeared in SpamAssassin.  There was
none of this bypassing crudola happening until SpamAssassin popularized
Bayes :)

Now I get messages with a spam text/html mime part and the Declaration of
Independence for a text/plain mime part to fool the Bayes filter.  It is
very effective at maiming my Bayes filter.  Of course I could start dropping
text/plain mime parts when there is a text/html part present.

Fox

----- Original Message -----
From: "Daniel Quinlan" <[EMAIL PROTECTED]>
To: "Mike Batchelor" <[EMAIL PROTECTED]>
Cc: <[EMAIL PROTECTED]>
Sent: Thursday, June 26, 2003 3:06 AM
Subject: Re: [SAtalk] Spammers sneaking lower Bayes scores


> Mike Batchelor <[EMAIL PROTECTED]> writes:
>
> > Note the random words within the <font> tags at the end of the spam.  I
> > think they lowered its Bayes score, which dropped it below my threshold
> > overall.  That, and the lack of any other text aside from the links...
>
> Yes, the random words in that email work against Bayes (I think this
> particular exploit has been well-known for a while, and spammers are
> definitely getting better at it).  It's why I've never believed that
> Bayes is a panacea.  This goes back to the SA philosophy: we don't rely
> on any one technique since no one technique is prefect.  SpamAssassin
> only uses Bayes as a small subset of the rules.
>
> And while random words fool simple checksum systems like Razor1, they
> don't fool Razor2, DCC, or Pyzor.  They also don't fool RBLs.  This spam
> was lucky enough not to be listed anywhere yet, but most won't be.
>
> > Is this tactic likely to succeed for them, rendering our Bayesian
> > classifiers ineffective?  What do you think?
>
> Well, it can work.  (It does take a fairly smart spammer to pull it
> off.)
>
> Here's the list of tokens matched (sorted by probability).  It does look
> like they've managed to construct a message with a very low score, all
> around.
>
> debug: bayes token 'bg.jpg' => 0.997298245614035
> debug: bayes token 'take-me-off' => 0.994923076923077
> debug: bayes token 'dairy' => 0.988731707317073
> debug: bayes token 'URI' => 0.965041112956667
> debug: bayes token 'studs' => 0.958
> debug: bayes token 'N:HX-Mail-Format-Warning:RFCNNNN' => 0.958
> debug: bayes token 'HX-Mail-Format-Warning:header' => 0.958
> debug: bayes token 'HX-Mail-Format-Warning:formatting' => 0.958
> debug: bayes token 'HX-Mail-Format-Warning:RFC2822' => 0.958
> debug: bayes token 'HX-Mail-Format-Warning:Bad' => 0.958
> debug: bayes token 'amazing' => 0.942726598001046
> debug: bayes token 'images' => 0.926845523863797
> debug: bayes token 'H*r:501' => 0.925317612750241
> debug: bayes token 'index.html' => 0.903306785729035
> debug: bayes token 'H*c:HHHH' => 0.882993935307079
> debug: bayes token 'N:H*M:NNNNNNNNNNNNNN' => 0.151858612440554
> debug: bayes token 'N:H*r:NNN' => 0.146710592234121
> debug: bayes token 'N:H*r:N.NN.N' => 0.143944051542205
> debug: bayes token 'H*r:8.12.2' => 0.131883948595905
> debug: bayes token 'N:H*M:NNNNN' => 0.107562836072879
> debug: bayes token 'N:HX-Sieve:N.N' => 0.0489090909090909
> debug: bayes token 'HX-Sieve:cmu-sieve' => 0.0489090909090909
> debug: bayes token 'verbally' => 0.0256190476190476
> debug: bayes token 'gels' => 0.0256190476190476
> debug: bayes token 'hairiness' => 0.0173548387096774
> debug: bayes token 'modulating' => 0.0131219512195122
> debug: bayes: score = 0.564805986629628
>
> (Any Bayesian classifier can produce this type of list once you build a
> corpus.  And there's no magic to Bayes that prevents spammers from doing
> this too to figure out how many words need to be counterbalanced, etc.)
>
> Note that my Bayes database picked up on some tokens that were actually
> added by your personal software (like X-Sieve and some of the other
> tokens, like the Message-ID).  If I remove those, my Bayes probability
> would go up.  I didn't get this particular spam, so I don't know if
> Bayes would have worked for me or not.  Probably not enough to catch it
> as spam, though.
>
> Some enhancements to Bayes might be in order.
>
> Daniel
>
> --
> Daniel Quinlan                     anti-spam (SpamAssassin), Linux, and
open
> http://www.pathname.com/~quinlan/   source consulting (looking for new
work)
>
>
> -------------------------------------------------------
> This SF.Net email is sponsored by: INetU
> Attention Web Developers & Consultants: Become An INetU Hosting Partner.
> Refer Dedicated Servers. We Manage Them. You Get 10% Monthly Commission!
> INetU Dedicated Managed Hosting http://www.inetu.net/partner/index.php
> _______________________________________________
> Spamassassin-talk mailing list
> [EMAIL PROTECTED]
> https://lists.sourceforge.net/lists/listinfo/spamassassin-talk



-------------------------------------------------------
This SF.Net email is sponsored by: INetU
Attention Web Developers & Consultants: Become An INetU Hosting Partner.
Refer Dedicated Servers. We Manage Them. You Get 10% Monthly Commission!
INetU Dedicated Managed Hosting http://www.inetu.net/partner/index.php
_______________________________________________
Spamassassin-talk mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/spamassassin-talk

Reply via email to