Ross Vandegrift wrote:
Paul Graham deals with this point in his article that started it all (http://paulgraham.com/spam.html). What he basically says (or hopes) is that the more innocuous the content looks, the more important the headers and urls become:On Tue, Nov 19, 2002 at 05:14:46PM +0000, Justin Mason wrote:>I notice the bayes-busting spam posted on spambayes used linuxy text. >That probably works quite well given the current bayes userset ;) Ahhhhh, that's friggin genius! And it's almost a perfectly crafted attack on Bayes-like filtering systems. That is gonna be hard to beat without diluting the power of recognizing mail about linuxy type stuff.
<quote>
To beat Bayesian filters, it would not be enough for spammers to make their emails unique or to stop using individual naughty words. They'd have to make their mails indistinguishable from your ordinary mail. And this I think would severely constrain them. Spam is mostly sales pitches, so unless your regular mail is all sales pitches, spams will inevitably have a different character. And the spammers would also, of course, have to change (and keep changing) their whole infrastructure, because otherwise the headers would look as bad to the Bayesian filters as ever, no matter what they did to the message body. I don't know enough about the infrastructure that spammers use to know how hard it would be to make the headers look innocent, but my guess is that it would be even harder than making the message look innocent.
Assuming they could solve the problem of the headers, the spam of the future will probably look something like this:
Hey there. Thought you should check out the following: http://www.27meg.com/foo
because that is about as much sales pitch as content-based filtering will leave the spammer room to make. (Indeed, it will be hard even to get this past filters, because if everything else in the email is neutral, the spam probability will hinge on the url, and it will take some effort to make that look neutral.)
</quote>
Also I understand his explanation, only the most interesting tokens are considered in calculating the likelyhood that it's spam, so watering down the body of the message should only makes the interesting things more interesting.
--
Sean Redmond
BMA Information Systems
smime.p7s
Description: S/MIME Cryptographic Signature