Hi all, Bayes seems to be missing quite a lot of spam. I'm getting these results quite often:
<snip> Email: 63252 Autolearn: 26740 AvgScore: 14.53 AvgScanTime: 1.69 sec Spam: 51232 Autolearn: 23252 AvgScore: 21.08 AvgScanTime: 1.68 sec Ham: 12020 Autolearn: 3488 AvgScore: -13.40 AvgScanTime: 1.72 sec TOP SPAM RULES FIRED ---------------------------------------------------------------------- RANK RULE NAME COUNT %OFMAIL %OFSPAM %OFHAM ---------------------------------------------------------------------- 1 HTML_MESSAGE 36720 70.25 71.67 64.18 2 BAYES_99 35269 56.74 68.84 5.17 3 URIBL_SBL 32502 54.28 63.44 15.22 4 URIBL_JP_SURBL 31805 50.70 62.08 2.20 5 URIBL_SC_SURBL 27524 43.83 53.72 1.65 6 URIBL_OB_SURBL 22908 36.27 44.71 0.29 7 RCVD_IN_BL_SPAMCOP_NET 22082 35.55 43.10 3.35 8 URIBL_AB_SURBL 21789 34.63 42.53 0.96 9 AWL 19280 43.57 37.63 68.89 10 RCVD_IN_XBL 17122 27.09 33.42 0.12 11 FORGED_RCVD_HELO 15386 28.34 30.03 21.12 12 RCVD_IN_SORBS_DUL 13501 21.49 26.35 0.74 13 RCVD_IN_NJABL_DUL 10934 17.37 21.34 0.43 14 BODY_GAPPY_TEXT 10888 22.04 21.25 25.40 15 URIBL_WS_SURBL 10615 16.80 20.72 0.08 16 NO_REAL_NAME 8883 22.63 17.34 45.18 17 MIME_HTML_ONLY 8226 16.09 16.06 16.21 18 MSGID_FROM_MTA_ID 7667 13.04 14.97 4.83 19 BAYES_00 7445 23.53 14.53 61.87 20 SUBJ_SPAMWORD 7012 11.56 13.69 2.49 ----------------------------------------------------------------------
To me, it looks like Bayes_00 is hitting far too much spam.
<snip> ~ $ sa-learn --dump magic 0.000 0 3 0 non-token data: bayes db version 0.000 0 2110713 0 non-token data: nspam 0.000 0 156758 0 non-token data: nham 0.000 0 1608693 0 non-token data: ntokens 0.000 0 1153323145 0 non-token data: oldest atime 0.000 0 1153446556 0 non-token data: newest atime 0.000 0 1153446557 0 non-token data: last journal sync atime 0.000 0 1153367234 0 non-token data: last expiry atime 0.000 0 43200 0 non-token data: last expire atime delta 0.000 0 1204872 0 non-token data: last expire reduction count
I have fed a large amount of mail into Bayes: And I'm quite certain that it was fed correctly. All of the misses I have checked have hit Bayes_00. Any ideas why this is happening? I have toyed with the idea of lowering the bayes_00 score. Anyone care to enlighten me on whether this would be a bad idea and why?
Methinks you don't have enough mail trained in bayes... take a look at my numbers for hit count, then see how many spam and ham tokens I have in my bayes database.
If more training doesn't correct the scoring, you could lower the score for bayes_00, but mine's untouched.
Regards, Leigh Leigh Sharpe Network Systems Engineer Pacific Wireless Ph +61 3 9584 8966 Mob 0408 009 502 email [EMAIL PROTECTED] web www.pacificwireless.com.au
-Gary