After working brilliantly for a long time suddenly the Bayes engine is
misclassifying messages and creating false positives.

Having a false positive every now and again is nothing new and I would
train on error when that would occur and correct the issue.  The
problem I am seeing now is that when I train the message the Bayes
engine does not learn the message as ham and still classifies the
message as 99% likely to be spam.

  | sa-learn --ham    # no significant effect now

Any clues as to why it is refusing to learn the message as non-spam?

More importantly what steps should I take to to correct the situation
and get the Bayes engine tuned up?

Is it just being overwhelmed by the number of spam messages without
enough ham to keep things balanced?

Thanks
Bob

A minor amount of manual obfuscation of mail addresses below.

[15528] dbg: bayes: tie-ing to DB file R/O /home/mlmgr/.spamassassin/bayes_toks
[15528] dbg: bayes: tie-ing to DB file R/O /home/mlmgr/.spamassassin/bayes_seen
[15528] dbg: bayes: found bayes db version 3
[15528] dbg: bayes: DB journal sync: last sync: 1162878184
[15528] dbg: bayes: DB journal sync: last sync: 1162878184
[15528] dbg: bayes: corpus size: nspam = 95535, nham = 38741
[15528] dbg: bayes: header tokens for Envelope-to = " [EMAIL PROTECTED]"
[15528] dbg: bayes: header tokens for *M = " OEA0023 OEB4178 OEC0a01a8c0 
kjetil3c0c3416 "
[15528] dbg: bayes: header tokens for *F = "U*kjetb D*online.no D*no"
[15528] dbg: bayes: header tokens for To = "U*bug-list D*bogus.example.com 
D*com"
[15528] dbg: bayes: header tokens for MIME-Version = " "
[15528] dbg: bayes: header tokens for *c = " multipart/alternative; ----=_ 
NHxtPHrt _ HHH _ HHHH _ HHHHHHHH . HHHHHHHH"
[15528] dbg: bayes: header tokens for X-Priority = " 3"
[15528] dbg: bayes: header tokens for X-MSMail-Priority = " Normal"
[15528] dbg: bayes: header tokens for *x = " Microsoft Outlook Express 
6.00.2900.2869"
[15528] dbg: bayes: header tokens for Disposition-Notification-To = " User Name 
<[EMAIL PROTECTED]>"
[15528] dbg: bayes: header tokens for X-MimeOLE = " Produced By Microsoft 
MimeOLE V6.00.2900.2962"
[15528] dbg: bayes: header tokens for X-Spam-Relays-External = " [ 
ip=193.213.115.46 rdns=mail46.e.nsc.no helo=mail46.e.nsc.no 
by=monty-python.example.com ident= envfrom= intl=0 id=1GfOG5-0006Pw-Uh auth= ] 
[ ip=85.164.226.109 rdns=ti400720a081-8813.bb.online.no helo=kjetil3c0c3416 
by=mail46.nsc.no ident= envfrom= intl=0 id=kA1M8Nuo009386 auth= ]"
[15528] dbg: bayes: header tokens for X-Spam-Relays-Internal = " [ 
ip=199.232.76.173 rdns=monty-python.example.com helo=monty-python.example.com 
by=lists.example.com ident= envfrom= intl=1 id=1GfOG6-0002kO-Gu auth= ]"
[15528] dbg: bayes: header tokens for *RT = " [ ip=199.232.76.173 
rdns=monty-python.example.com helo=monty-python.example.com 
by=lists.example.com ident= envfrom= intl=1 id=1GfOG6-0002kO-Gu auth= ]"
[15528] dbg: bayes: header tokens for *RU = " [ ip=193.213.115.46 
rdns=mail46.e.nsc.no helo=mail46.e.nsc.no by=monty-python.example.com ident= 
envfrom= intl=0 id=1GfOG5-0006Pw-Uh auth= ] [ ip=85.164.226.109 
rdns=ti400720a081-8813.bb.online.no helo=kjetil3c0c3416 by=mail46.nsc.no ident= 
envfrom= intl=0 id=kA1M8Nuo009386 auth= ]"
[15528] dbg: bayes: header tokens for *r = " kjetil3c0c3416 
(ti400720a081-8813.bb.online.no [85.164.226 ip*85.164.226.109 ]) by 
mail46.nsc.no (8.13.8/8.13.5) <[EMAIL PROTECTED]>; "
[15528] dbg: bayes: header tokens for *r = " kjetil3c0c3416 
(ti400720a081-8813.example.com [85.164.226 ip*85.164.226.109 ]) by 
mail46.example.com (8.13.8/8.13.5) <[EMAIL PROTECTED]>; [193.213.115 
ip*193.213.115.46 ] (helo=mail46.example.com) by monty-python.example.com esmtp 
(Exim 4.52) id 1GfOG5-0006Pw-Uh [EMAIL PROTECTED]; "
[15528] dbg: bayes: token 'H*c:alternative' => 0.994304725802302
[15528] dbg: bayes: token 'H*c:HHH' => 0.992454564805099
[15528] dbg: bayes: token 'H*c:NHxtPHrt' => 0.991016151567721
[15528] dbg: bayes: token 'H*c:HHHH' => 0.985263624445525
[15528] dbg: bayes: token 'H*MI:OEA0023' => 0.985096774193548
[15528] dbg: bayes: token 'H*M:OEA0023' => 0.985096774193548
[15528] dbg: bayes: token 'H*UA:Express' => 0.985060557114832
[15528] dbg: bayes: token 'H*x:Express' => 0.985059973253254
[15528] dbg: bayes: token 'H*c:HHHHHHHH' => 0.983903973265669
[15528] dbg: bayes: token 'H*UA:Microsoft' => 0.978901128935159
[15528] dbg: bayes: token 'H*x:Microsoft' => 0.978777767154235
[15528] dbg: bayes: token 'H*x:Outlook' => 0.978777376366316
[15528] dbg: bayes: token 'H*UA:Outlook' => 0.978777376366316
[15528] dbg: bayes: token 'HX-MimeOLE:V6.00.2900.2962' => 0.976898908840907
[15528] dbg: bayes: token 'HX-MimeOLE:MimeOLE' => 0.976313886128059
[15528] dbg: bayes: token 'HX-MSMail-Priority:Normal' => 0.974305670960733
[15528] dbg: bayes: token 'HX-MimeOLE:Microsoft' => 0.959224439139177
[15528] dbg: bayes: token 'HX-MimeOLE:Produced' => 0.959178732453666
[15528] dbg: bayes: token 'H*r:8.13.5' => 0.958
[15528] dbg: bayes: token 'analyze' => 0.956994275272798
[15528] dbg: bayes: token 'gammonempire' => 0.0489090909090909
[15528] dbg: bayes: token 'Gammonempire' => 0.0489090909090909
[15528] dbg: bayes: token 'kjetil' => 0.0489090909090909
[15528] dbg: bayes: token 'gam' => 0.0489090909090909
[15528] dbg: bayes: token 'Kjetil' => 0.0489090909090909
[15528] dbg: bayes: token 'HX-Spam-Relays-Internal:lists.example.com' => 
0.950917490471412
[15528] dbg: bayes: token 'HX-Spam-Relays-Internal:sk:monty-p' => 
0.95091594711816
[15528] dbg: bayes: token 'HX-Spam-Relays-Internal:199.232.76.173' => 
0.95091594711816
[15528] dbg: bayes: token 'HX-Spam-Relays-Internal:envfrom' => 0.950880625609595
[15528] dbg: bayes: token 'HX-Spam-Relays-Internal:auth' => 0.950880625609595
[15528] dbg: bayes: token 'HX-Spam-Relays-Internal:helo' => 0.950880625609595
[15528] dbg: bayes: token 'HX-Spam-Relays-Internal:intl' => 0.950880625609595
[15528] dbg: bayes: token 'HX-Spam-Relays-Internal:ident' => 0.950880625609595
[15528] dbg: bayes: token 'HX-Spam-Relays-Internal:rdns' => 0.950880625609595
[15528] dbg: bayes: token 'files' => 0.0495878769699548
[15528] dbg: bayes: token 'H*x:6.00.2900.2869' => 0.949973757633861
[15528] dbg: bayes: token 'H*UA:6.00.2900.2869' => 0.949973757633861
[15528] dbg: bayes: token 'HX-Spam-Relays-External:intl' => 0.949193158307537
[15528] dbg: bayes: token 'HX-Spam-Relays-External:rdns' => 0.949193158307537
[15528] dbg: bayes: token 'HX-Spam-Relays-External:ident' => 0.949193158307537
[15528] dbg: bayes: token 'HX-Spam-Relays-External:envfrom' => 0.949193158307537
[15528] dbg: bayes: token 'HX-Spam-Relays-External:auth' => 0.949193158307537
[15528] dbg: bayes: token 'HX-Spam-Relays-External:helo' => 0.949193158307537
[15528] dbg: bayes: token 'be' => 0.94760196136949
[15528] dbg: bayes: token 'HX-Spam-Relays-External:sk:monty-p' => 
0.944475643412366
[15528] dbg: bayes: token 'Be' => 0.942119064159671
[15528] dbg: bayes: token 'version' => 0.0610331703110521
[15528] dbg: bayes: token 'H*r:8.13.8' => 0.0651874814355933
[15528] dbg: bayes: token 'H*F:D*no' => 0.898945798236785
[15528] dbg: bayes: token 'H*RU:intl' => 0.891670002899046
[15528] dbg: bayes: token 'include' => 0.113006581000787
[15528] dbg: bayes: token 'format' => 0.133570931753683
[15528] dbg: bayes: token 'H*c:multipart' => 0.864700569756485
[15528] dbg: bayes: token 'H*r:4.52' => 0.863412384491978
[15528] dbg: bayes: score = 0.999940788449734
[15528] dbg: bayes: DB journal sync: last sync: 1162878216
[15528] dbg: bayes: untie-ing
[15528] dbg: bayes: untie-ing db_toks
[15528] dbg: bayes: untie-ing db_seen

Reply via email to