Hello,

I've implemented a Bayesian filtering scheme on my system that runs concurrent with SpamAssassin. It works really well, but I am starting to think there is an easy attack that would render the filtering useless.

What if, at the end of every message, spammers appended a list of a thousand or more randomly selected common dictionary words. Wouldn't these words overwhelm a Bayesian filtering scheme? Sure, the spam phrases would still be present in the top part of the message, but the common, non-spam words at the bottom would make the message appear, statistically, less spam-like, perhaps enough to get it by the filter. Further, as these messages were included in a user's spam corpus, would not legitimate messages start to appear, statistically speaking, like spam, thus increasing false positives?

Perhaps this notion is based on a misunderstanding of how Bayesian filtering works, or perhaps there are ways of working around it, but has anyone given this idea any thought?

Thanks,

Chris Eykamp



-------------------------------------------------------
This sf.net email is sponsored by: To learn the basics of securing your web site with SSL, click here to get a FREE TRIAL of a Thawte Server Certificate: http://www.gothawte.com/rd524.html
_______________________________________________
Spamassassin-talk mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/spamassassin-talk

Reply via email to