Clay Davis wrote:
Over the past several months I have been saving the spam that slips through to my users accounts to train my bayes with. I notice that lately almost all of it has (what I am assuming to be) an attempt to poison my bayes (a bunch of valid words put together in a nonsensical paragraph) at the bottom of it. How much should I worry about this type of spam and how it will affect my bayes db? Work arounds? Advice? Thanks, gang. Clay
        
        Hi, Clay.  Without getting into the math behind it, Bayes poisoning is almost impossible.  I have been training 
my Bayes DB with everything I consider "spam", wether it has a "poison" section or not.  I'm almost 
always seeing a BAYES_99 result on these "poisoned" emails.  Why?  Because the key tokens that make it spam 
are repeated;  the "poison" text is not.

        I use a combination of auto-training and hand-correction with my DB.  I only 
"correct" if the answer is not a BAYES_99.  Don't sweat the "poison", Bayes is 
almost immune to Iocane, etc.

--
        --Michel Vaillancourt
        Wolfstar Systems
        www.wolfstar.ca

Reply via email to