I've been getting quite a few spams (which slipped past SA) in the last few
minutes with subject lines like "dies in McDonalds", so I looked at the
message source to see how they were scoring (which I've included below). In
all the cases, the HTML content (at least as displayed in Outlook Express)
was fairly consistent, but the plain text version looked like typical Bayes
poisoning text.
Would it be possible to craft a rule that roughly compares the text/plain
and HTML-stripped text/html versions of a message and scored against them if
the words they contained were significantly different? Or is that
technically infeasible?
Content-Type: text/plain;
Hello,
5. Kislovodsk: Literally `acid waters, a popular resort in t=
he =
`Thats wonderful! Koroviev yelled. Somewhat stunned by his =
chatter,that one could execute such a man. There had been no =
execution! Nocloser, youll see the details.midnight moon. A greenish =
kerchief of night-light fell from the window-sillup still more ... She =
greedily began gulping down caviar.up to the footboard of an A tram =
waiting at a stop, brazenly elbow aside a Here he applauded, but =
quite alone, while a confident smile played onthat might occur at the =
time of the execution in the city of Yershalaim, sospeaking, I had =
nothing more to do, and I lived from one meeting with her toPetrakovs. =
Placing his bulging briefcase on the table, Boba immediately =
putposts?[6]horizon. He did not rejoice in the staggeringly beautiful =
view which openedpaying or free, but even changes countenance at any =
theatrical conversation.what she was going to tell the neighbours the =
next day.phrase:
#########
Content-Type: text/html;
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">
<HTML><HEAD>
<META http-equiv=3DContent-Type content=3D"text/html; charset=3Dus-ascii">
<META content=3D"MSHTML 6.00.2800.1106" name=3DGENERATOR>
<STYLE></STYLE>
</HEAD>
<BODY bgColor=3D#ffffff>
<DIV><FONT face=3DArial></FONT> </DIV>
<DIV><FONT face=3DArial>A court has sentenced a man to life in jail for the
=
=
bombing of a McDonald's restaurant, which left three people =
dead.</FONT></DIV>
<DIV><FONT face=3DArial></FONT> </DIV>
<DIV><FONT face=3DArial>The man, Agung Abdul Hamid, was found guilty of =
financing
and co-ordinating the attack.</FONT></DIV>
<DIV><FONT face=3DArial></FONT> </DIV>
<DIV><FONT face=3DArial><A href=3D"http://www.ildhd.lastrez.com">Read full =
=
story.</A></FONT></DIV>
<DIV> </DIV></BODY></HTML>