Re: Avoiding Bayes Poison

Michel R Vaillancourt Thu, 11 Jan 2007 07:33:33 -0800

Clay Davis wrote:

Over the past several months I have been saving the spam that slipsthrough to my users accounts to train my bayes with. I notice thatlately almost all of it has (what I am assuming to be) an attempt topoison my bayes (a bunch of valid words put together in a nonsensicalparagraph) at the bottom of it.How much should I worry about this type of spam and how it will affectmy bayes db? Work arounds? Advice?Thanks, gang.Clay

        
        Hi, Clay.  Without getting into the math behind it, Bayes poisoning is almost impossible.  I have been training 
my Bayes DB with everything I consider "spam", wether it has a "poison" section or not.  I'm almost 
always seeing a BAYES_99 result on these "poisoned" emails.  Why?  Because the key tokens that make it spam 
are repeated;  the "poison" text is not.


        I use a combination of auto-training and hand-correction with my DB.  I only 
"correct" if the answer is not a BAYES_99.  Don't sweat the "poison", Bayes is 
almost immune to Iocane, etc.

--
        --Michel Vaillancourt
        Wolfstar Systems
        www.wolfstar.ca

Re: Avoiding Bayes Poison

Reply via email to