On Thu, 2003-06-26 at 13:38, Fox Flanders wrote:
> I blame SpamAssassin for these Bayes bypassing tricks.  I had a custom Bayes
> solution working many months before it appeared in SpamAssassin.  There was
> none of this bypassing crudola happening until SpamAssassin popularized
> Bayes :)

> Now I get messages with a spam text/html mime part and the Declaration of
> Independence for a text/plain mime part to fool the Bayes filter.  It is
> very effective at maiming my Bayes filter.  Of course I could start dropping
> text/plain mime parts when there is a text/html part present.
> 
> Fox

Correct me if i'm wrong here. 

In a legitimate email message the plain part of the message and the html
part (with markup removed) should be substantially similar, or at least
contain a high percentage of identical words. The half-dozen I just
looked at seem to be pretty much identical once stripped of html and
linefeeds.

Maybe something like a mime-part variance score can be made, percentage
difference between plain and stripped html messages, or maybe even as
simple as picking a few words at random from each part and looking for
them in the other part. That would give something else to score spam on,
or maybe just for bayes to modify its behaviour against messages with
large differences in mime parts.

-- 
Yorkshire Dave


-------------------------------------------------------
This SF.Net email is sponsored by: INetU
Attention Web Developers & Consultants: Become An INetU Hosting Partner.
Refer Dedicated Servers. We Manage Them. You Get 10% Monthly Commission!
INetU Dedicated Managed Hosting http://www.inetu.net/partner/index.php
_______________________________________________
Spamassassin-talk mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/spamassassin-talk

Reply via email to