On Thu, 2003-06-26 at 13:38, Fox Flanders wrote: > I blame SpamAssassin for these Bayes bypassing tricks. I had a custom Bayes > solution working many months before it appeared in SpamAssassin. There was > none of this bypassing crudola happening until SpamAssassin popularized > Bayes :)
> Now I get messages with a spam text/html mime part and the Declaration of > Independence for a text/plain mime part to fool the Bayes filter. It is > very effective at maiming my Bayes filter. Of course I could start dropping > text/plain mime parts when there is a text/html part present. > > Fox Correct me if i'm wrong here. In a legitimate email message the plain part of the message and the html part (with markup removed) should be substantially similar, or at least contain a high percentage of identical words. The half-dozen I just looked at seem to be pretty much identical once stripped of html and linefeeds. Maybe something like a mime-part variance score can be made, percentage difference between plain and stripped html messages, or maybe even as simple as picking a few words at random from each part and looking for them in the other part. That would give something else to score spam on, or maybe just for bayes to modify its behaviour against messages with large differences in mime parts. -- Yorkshire Dave ------------------------------------------------------- This SF.Net email is sponsored by: INetU Attention Web Developers & Consultants: Become An INetU Hosting Partner. Refer Dedicated Servers. We Manage Them. You Get 10% Monthly Commission! INetU Dedicated Managed Hosting http://www.inetu.net/partner/index.php _______________________________________________ Spamassassin-talk mailing list [EMAIL PROTECTED] https://lists.sourceforge.net/lists/listinfo/spamassassin-talk