The spammer could now bypass bayes by inserting a HTML comment at the beginning consisting of 200 bytes or 20 lines of ham, end the comment, and begin his spam message.
If you strip HTML prior to bayes this could also be done by using a white-on-white text in tiny font tag prior to the bogus ham, with lots of newlines, and then switch back to a readable color and begin their marketing. The final message once processed by the MUA as an HTML message will appear as if it has only a couple blank lines at the top (because the font is small, and HTML will ignore the newlines) but will miss bayes entirely.
Of course, there's some scoring penalties to using white text and tiny fonts in the regular rules, but this might not hurt you as bad as the bayes.
At 01:24 PM 11/20/2002 -0600, Bob Apthorpe wrote:
How much of a message does a human need to read before they classify it as spam? And where in the message? Top? Middle? Bottom?I'm guessing that the top 5-20 lines of the body will give a human enough information to classify the message so limit the Bayesian analysis of the body text to the top 20 lines or the first 200 words. If you're trying to promote something, you need to get to the point of the pitch very quickly. If we analyze a prominent subsection of the message, we initially avoid analyzing any intentional noise added to 'ham up' the message, assuming spammers put the false ham at the end of the message.
------------------------------------------------------- This sf.net email is sponsored by:ThinkGeek Welcome to geek heaven. http://thinkgeek.com/sf _______________________________________________ Spamassassin-talk mailing list [EMAIL PROTECTED] https://lists.sourceforge.net/lists/listinfo/spamassassin-talk