Clay Davis wrote:
Over the past several months I have been saving the spam that slips through to my users accounts to train my bayes with. I notice that lately almost all of it has (what I am assuming to be) an attempt to poison my bayes (a bunch of valid words put together in a nonsensical paragraph) at the bottom of it. How much should I worry about this type of spam and how it will affect my bayes db? Work arounds? Advice? Thanks, gang. Clay
Hi, Clay. Without getting into the math behind it, Bayes poisoning is almost impossible. I have been training my Bayes DB with everything I consider "spam", wether it has a "poison" section or not. I'm almost always seeing a BAYES_99 result on these "poisoned" emails. Why? Because the key tokens that make it spam are repeated; the "poison" text is not.
I use a combination of auto-training and hand-correction with my DB. I only "correct" if the answer is not a BAYES_99. Don't sweat the "poison", Bayes is almost immune to Iocane, etc. -- --Michel Vaillancourt Wolfstar Systems www.wolfstar.ca