Bowie Bailey wrote: > JD Smith wrote: >> So, what exactly is bayes poison? > > "Bayes poison" is a collection of random words or text selections that > have nothing to do with the email subject and are only there in an > attempt to confuse the Bayes database. This doesn't really work the > way the spammers would like to think it does, but they keep doing it > anyway.
How well bayes poison works depends a lot on your "bayes" implementation. Some "bayes" implementations are fairly susceptible to this. (I put "bayes" in quotes because not all bayes implementations are really Bayesian at all. Actually, most are not, including SA.) In particular, the choice of combining algorithm seems to matter a lot. The use of chi-squared combining, instead of true Bayesian combining, seems to make SA's bayes rather resistant to this. (note: the use of chi-squared is not exclusive to SA.. many "bayes" implementations do this, but not all.) Another area of influence is the choice of tokens. Words vs chars, hapaxes, etc all change how a bayes implementation reacts to poisoning attempts. So spammers keep using bayes poison because it works in some cases. It also doesn't really hurt them much, and sometimes even helps them, against more resistant implementations.