Sean Redmond said the following on 20/11/02 15:25:
No, it doesn't. It puts it into the "unknown" category. I assume SpamAssassin's implementation is using the same rules as spambayes, which means unknown words get a probability of 0.5.Matt Sergeant wrote:Also I understand his explanation, only the most interesting tokens are considered in calculating the likelyhood that it's spam, so watering down the body of the message should only makes the interesting things more interesting.But Graham's analysis is wrong here. If you want to defeat bayesian filters, the spam of the future will look like: __BEGIN__ Hey there. Thought you should check out the following: http://www.27meg.com/foo(snip: linuxy, footbally, disclaimery content)__END__ This covers a fairly broad section of people's training data (a linuxy type mail, a football related mail, and a corporate legal disclaimer), and so those things will be the "interesting" tokens from the ham corpus.
But this is where the personalization of the corpus is important, because *I* never get football related mail, so that makes it suspicious right there.
I think that proves my point :-)
Disclaimers are so common I don't think they would be considered in the calculation, right?Wrong. How do you delimit them? I see all sorts here at work. Some up to 150 lines, including at the top and at the bottom. There's no way SpamAssassin could effectively ignore them.
Plus, their pitch would be so buried in all the fluff that you wouldn't be able to find it unless they made the the linuxy text very small or white-on-white or clear, and those html tags would then become *very* statistically significant.Use one tag. Then it's a single 1.0 score. And my example already made it white on white.
I've been doing bayesian filtering for about 10 months now. I've thought about this a lot. I really don't think it's invincible, but that doesn't mean it's ineffective.
Matt.
-------------------------------------------------------
This sf.net email is sponsored by: To learn the basics of securing your web site with SSL, click here to get a FREE TRIAL of a Thawte Server Certificate: http://www.gothawte.com/rd524.html
_______________________________________________
Spamassassin-talk mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/spamassassin-talk