Christopher Eykamp said the following on 18/11/02 23:15:

I've implemented a Bayesian filtering scheme on my system that runs concurrent with SpamAssassin. It works really well, but I am starting to think there is an easy attack that would render the filtering useless.

What if, at the end of every message, spammers appended a list of a thousand or more randomly selected common dictionary words. Wouldn't these words overwhelm a Bayesian filtering scheme? Sure, the spam phrases would still be present in the top part of the message, but the common, non-spam words at the bottom would make the message appear, statistically, less spam-like, perhaps enough to get it by the filter. Further, as these messages were included in a user's spam corpus, would not legitimate messages start to appear, statistically speaking, like spam, thus increasing false positives?

Perhaps this notion is based on a misunderstanding of how Bayesian filtering works, or perhaps there are ways of working around it, but has anyone given this idea any thought?
The spammers have. An even better way they've found is to include a snippet from a legit mailing list, but put it in a white text on white background box. This was discussed on the spambayes mailing list.

As Justin says, it's not always going to be terribly effective, but I disagree. If you blast your email full of words that for most people would be innocuous, then it's going to get around bayesian filters probably. Of course this may end up being just another thing SpamAssassin can detect, so we have to play the waiting game and see what spammers come up with next.

As Craig says, it only matters that we catch spam from the dumb 95% of spammers out there. If we miss the smart 5% that's OK, and we'll get them next time around.


This email is sponsored by: To learn the basics of securing your web site with SSL, click here to get a FREE TRIAL of a Thawte Server Certificate:
Spamassassin-talk mailing list

Reply via email to