I've been getting quite a few spams (which slipped past SA) in the last few minutes with subject lines like "dies in McDonalds", so I looked at the message source to see how they were scoring (which I've included below). In all the cases, the HTML content (at least as displayed in Outlook Express) was fairly consistent, but the plain text version looked like typical Bayes poisoning text.

Would it be possible to craft a rule that roughly compares the text/plain and HTML-stripped text/html versions of a message and scored against them if the words they contained were significantly different? Or is that technically infeasible?




 Content-Type: text/plain;


Hello,
    5.  Kislovodsk:  Literally  `acid  waters,  a  popular resort  in  t=
he =
   `Thats wonderful! Koroviev  yelled. Somewhat stunned by his  =
chatter,that  one  could execute  such  a man.  There  had  been  no  =
execution!  Nocloser, youll see the details.midnight moon. A greenish =
kerchief of  night-light fell from the window-sillup still more ... She =
greedily began gulping down caviar.up to the footboard of an A tram =
waiting at a stop, brazenly elbow aside a     Here he applauded, but =
quite  alone, while a confident smile  played onthat might occur at the =
time of the execution in the city of Yershalaim,  sospeaking, I had =
nothing more to do, and I lived from one meeting with her toPetrakovs. =
Placing his bulging briefcase on the table, Boba  immediately =
putposts?[6]horizon. He did not rejoice in the staggeringly beautiful =
view  which openedpaying or free, but even changes countenance at any =
theatrical conversation.what she was going to tell the neighbours the =
next day.phrase:

#########

 Content-Type: text/html;

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">
<HTML><HEAD>
<META http-equiv=3DContent-Type content=3D"text/html; charset=3Dus-ascii">
<META content=3D"MSHTML 6.00.2800.1106" name=3DGENERATOR>
<STYLE></STYLE>
</HEAD>
<BODY bgColor=3D#ffffff>
<DIV><FONT face=3DArial></FONT>&nbsp;</DIV>
<DIV><FONT face=3DArial>A court has sentenced a man to life in jail for the =
=

bombing of a McDonald's restaurant, which left three people =
dead.</FONT></DIV>
<DIV><FONT face=3DArial></FONT>&nbsp;</DIV>
<DIV><FONT face=3DArial>The man, Agung Abdul Hamid, was found guilty of =
financing
and co-ordinating the attack.</FONT></DIV>
<DIV><FONT face=3DArial></FONT>&nbsp;</DIV>
<DIV><FONT face=3DArial><A href=3D"http://www.ildhd.lastrez.com";>Read full =
=
story.</A></FONT></DIV>
<DIV>&nbsp;</DIV></BODY></HTML>

Reply via email to