After working brilliantly for a long time suddenly the Bayes engine is misclassifying messages and creating false positives.
Having a false positive every now and again is nothing new and I would train on error when that would occur and correct the issue. The problem I am seeing now is that when I train the message the Bayes engine does not learn the message as ham and still classifies the message as 99% likely to be spam. | sa-learn --ham # no significant effect now Any clues as to why it is refusing to learn the message as non-spam? More importantly what steps should I take to to correct the situation and get the Bayes engine tuned up? Is it just being overwhelmed by the number of spam messages without enough ham to keep things balanced? Thanks Bob A minor amount of manual obfuscation of mail addresses below. [15528] dbg: bayes: tie-ing to DB file R/O /home/mlmgr/.spamassassin/bayes_toks [15528] dbg: bayes: tie-ing to DB file R/O /home/mlmgr/.spamassassin/bayes_seen [15528] dbg: bayes: found bayes db version 3 [15528] dbg: bayes: DB journal sync: last sync: 1162878184 [15528] dbg: bayes: DB journal sync: last sync: 1162878184 [15528] dbg: bayes: corpus size: nspam = 95535, nham = 38741 [15528] dbg: bayes: header tokens for Envelope-to = " [EMAIL PROTECTED]" [15528] dbg: bayes: header tokens for *M = " OEA0023 OEB4178 OEC0a01a8c0 kjetil3c0c3416 " [15528] dbg: bayes: header tokens for *F = "U*kjetb D*online.no D*no" [15528] dbg: bayes: header tokens for To = "U*bug-list D*bogus.example.com D*com" [15528] dbg: bayes: header tokens for MIME-Version = " " [15528] dbg: bayes: header tokens for *c = " multipart/alternative; ----=_ NHxtPHrt _ HHH _ HHHH _ HHHHHHHH . HHHHHHHH" [15528] dbg: bayes: header tokens for X-Priority = " 3" [15528] dbg: bayes: header tokens for X-MSMail-Priority = " Normal" [15528] dbg: bayes: header tokens for *x = " Microsoft Outlook Express 6.00.2900.2869" [15528] dbg: bayes: header tokens for Disposition-Notification-To = " User Name <[EMAIL PROTECTED]>" [15528] dbg: bayes: header tokens for X-MimeOLE = " Produced By Microsoft MimeOLE V6.00.2900.2962" [15528] dbg: bayes: header tokens for X-Spam-Relays-External = " [ ip=193.213.115.46 rdns=mail46.e.nsc.no helo=mail46.e.nsc.no by=monty-python.example.com ident= envfrom= intl=0 id=1GfOG5-0006Pw-Uh auth= ] [ ip=85.164.226.109 rdns=ti400720a081-8813.bb.online.no helo=kjetil3c0c3416 by=mail46.nsc.no ident= envfrom= intl=0 id=kA1M8Nuo009386 auth= ]" [15528] dbg: bayes: header tokens for X-Spam-Relays-Internal = " [ ip=199.232.76.173 rdns=monty-python.example.com helo=monty-python.example.com by=lists.example.com ident= envfrom= intl=1 id=1GfOG6-0002kO-Gu auth= ]" [15528] dbg: bayes: header tokens for *RT = " [ ip=199.232.76.173 rdns=monty-python.example.com helo=monty-python.example.com by=lists.example.com ident= envfrom= intl=1 id=1GfOG6-0002kO-Gu auth= ]" [15528] dbg: bayes: header tokens for *RU = " [ ip=193.213.115.46 rdns=mail46.e.nsc.no helo=mail46.e.nsc.no by=monty-python.example.com ident= envfrom= intl=0 id=1GfOG5-0006Pw-Uh auth= ] [ ip=85.164.226.109 rdns=ti400720a081-8813.bb.online.no helo=kjetil3c0c3416 by=mail46.nsc.no ident= envfrom= intl=0 id=kA1M8Nuo009386 auth= ]" [15528] dbg: bayes: header tokens for *r = " kjetil3c0c3416 (ti400720a081-8813.bb.online.no [85.164.226 ip*85.164.226.109 ]) by mail46.nsc.no (8.13.8/8.13.5) <[EMAIL PROTECTED]>; " [15528] dbg: bayes: header tokens for *r = " kjetil3c0c3416 (ti400720a081-8813.example.com [85.164.226 ip*85.164.226.109 ]) by mail46.example.com (8.13.8/8.13.5) <[EMAIL PROTECTED]>; [193.213.115 ip*193.213.115.46 ] (helo=mail46.example.com) by monty-python.example.com esmtp (Exim 4.52) id 1GfOG5-0006Pw-Uh [EMAIL PROTECTED]; " [15528] dbg: bayes: token 'H*c:alternative' => 0.994304725802302 [15528] dbg: bayes: token 'H*c:HHH' => 0.992454564805099 [15528] dbg: bayes: token 'H*c:NHxtPHrt' => 0.991016151567721 [15528] dbg: bayes: token 'H*c:HHHH' => 0.985263624445525 [15528] dbg: bayes: token 'H*MI:OEA0023' => 0.985096774193548 [15528] dbg: bayes: token 'H*M:OEA0023' => 0.985096774193548 [15528] dbg: bayes: token 'H*UA:Express' => 0.985060557114832 [15528] dbg: bayes: token 'H*x:Express' => 0.985059973253254 [15528] dbg: bayes: token 'H*c:HHHHHHHH' => 0.983903973265669 [15528] dbg: bayes: token 'H*UA:Microsoft' => 0.978901128935159 [15528] dbg: bayes: token 'H*x:Microsoft' => 0.978777767154235 [15528] dbg: bayes: token 'H*x:Outlook' => 0.978777376366316 [15528] dbg: bayes: token 'H*UA:Outlook' => 0.978777376366316 [15528] dbg: bayes: token 'HX-MimeOLE:V6.00.2900.2962' => 0.976898908840907 [15528] dbg: bayes: token 'HX-MimeOLE:MimeOLE' => 0.976313886128059 [15528] dbg: bayes: token 'HX-MSMail-Priority:Normal' => 0.974305670960733 [15528] dbg: bayes: token 'HX-MimeOLE:Microsoft' => 0.959224439139177 [15528] dbg: bayes: token 'HX-MimeOLE:Produced' => 0.959178732453666 [15528] dbg: bayes: token 'H*r:8.13.5' => 0.958 [15528] dbg: bayes: token 'analyze' => 0.956994275272798 [15528] dbg: bayes: token 'gammonempire' => 0.0489090909090909 [15528] dbg: bayes: token 'Gammonempire' => 0.0489090909090909 [15528] dbg: bayes: token 'kjetil' => 0.0489090909090909 [15528] dbg: bayes: token 'gam' => 0.0489090909090909 [15528] dbg: bayes: token 'Kjetil' => 0.0489090909090909 [15528] dbg: bayes: token 'HX-Spam-Relays-Internal:lists.example.com' => 0.950917490471412 [15528] dbg: bayes: token 'HX-Spam-Relays-Internal:sk:monty-p' => 0.95091594711816 [15528] dbg: bayes: token 'HX-Spam-Relays-Internal:199.232.76.173' => 0.95091594711816 [15528] dbg: bayes: token 'HX-Spam-Relays-Internal:envfrom' => 0.950880625609595 [15528] dbg: bayes: token 'HX-Spam-Relays-Internal:auth' => 0.950880625609595 [15528] dbg: bayes: token 'HX-Spam-Relays-Internal:helo' => 0.950880625609595 [15528] dbg: bayes: token 'HX-Spam-Relays-Internal:intl' => 0.950880625609595 [15528] dbg: bayes: token 'HX-Spam-Relays-Internal:ident' => 0.950880625609595 [15528] dbg: bayes: token 'HX-Spam-Relays-Internal:rdns' => 0.950880625609595 [15528] dbg: bayes: token 'files' => 0.0495878769699548 [15528] dbg: bayes: token 'H*x:6.00.2900.2869' => 0.949973757633861 [15528] dbg: bayes: token 'H*UA:6.00.2900.2869' => 0.949973757633861 [15528] dbg: bayes: token 'HX-Spam-Relays-External:intl' => 0.949193158307537 [15528] dbg: bayes: token 'HX-Spam-Relays-External:rdns' => 0.949193158307537 [15528] dbg: bayes: token 'HX-Spam-Relays-External:ident' => 0.949193158307537 [15528] dbg: bayes: token 'HX-Spam-Relays-External:envfrom' => 0.949193158307537 [15528] dbg: bayes: token 'HX-Spam-Relays-External:auth' => 0.949193158307537 [15528] dbg: bayes: token 'HX-Spam-Relays-External:helo' => 0.949193158307537 [15528] dbg: bayes: token 'be' => 0.94760196136949 [15528] dbg: bayes: token 'HX-Spam-Relays-External:sk:monty-p' => 0.944475643412366 [15528] dbg: bayes: token 'Be' => 0.942119064159671 [15528] dbg: bayes: token 'version' => 0.0610331703110521 [15528] dbg: bayes: token 'H*r:8.13.8' => 0.0651874814355933 [15528] dbg: bayes: token 'H*F:D*no' => 0.898945798236785 [15528] dbg: bayes: token 'H*RU:intl' => 0.891670002899046 [15528] dbg: bayes: token 'include' => 0.113006581000787 [15528] dbg: bayes: token 'format' => 0.133570931753683 [15528] dbg: bayes: token 'H*c:multipart' => 0.864700569756485 [15528] dbg: bayes: token 'H*r:4.52' => 0.863412384491978 [15528] dbg: bayes: score = 0.999940788449734 [15528] dbg: bayes: DB journal sync: last sync: 1162878216 [15528] dbg: bayes: untie-ing [15528] dbg: bayes: untie-ing db_toks [15528] dbg: bayes: untie-ing db_seen