Hi all,
Bayes seems to be missing quite a lot of spam. I'm getting these
results quite often:
<snip>
Email: 63252 Autolearn: 26740 AvgScore: 14.53 AvgScanTime: 1.69 sec
Spam: 51232 Autolearn: 23252 AvgScore: 21.08 AvgScanTime: 1.68 sec
Ham: 12020 Autolearn: 3488 AvgScore: -13.40 AvgScanTime: 1.72 sec
TOP SPAM RULES FIRED
----------------------------------------------------------------------
RANK RULE NAME COUNT %OFMAIL %OFSPAM %OFHAM
----------------------------------------------------------------------
1 HTML_MESSAGE 36720 70.25 71.67 64.18
2 BAYES_99 35269 56.74 68.84 5.17
3 URIBL_SBL 32502 54.28 63.44 15.22
4 URIBL_JP_SURBL 31805 50.70 62.08 2.20
5 URIBL_SC_SURBL 27524 43.83 53.72 1.65
6 URIBL_OB_SURBL 22908 36.27 44.71 0.29
7 RCVD_IN_BL_SPAMCOP_NET 22082 35.55 43.10 3.35
8 URIBL_AB_SURBL 21789 34.63 42.53 0.96
9 AWL 19280 43.57 37.63 68.89
10 RCVD_IN_XBL 17122 27.09 33.42 0.12
11 FORGED_RCVD_HELO 15386 28.34 30.03 21.12
12 RCVD_IN_SORBS_DUL 13501 21.49 26.35 0.74
13 RCVD_IN_NJABL_DUL 10934 17.37 21.34 0.43
14 BODY_GAPPY_TEXT 10888 22.04 21.25 25.40
15 URIBL_WS_SURBL 10615 16.80 20.72 0.08
16 NO_REAL_NAME 8883 22.63 17.34 45.18
17 MIME_HTML_ONLY 8226 16.09 16.06 16.21
18 MSGID_FROM_MTA_ID 7667 13.04 14.97 4.83
19 BAYES_00 7445 23.53 14.53 61.87
20 SUBJ_SPAMWORD 7012 11.56 13.69 2.49
----------------------------------------------------------------------
To me, it looks like Bayes_00 is hitting far too much spam.
<snip>
~ $ sa-learn --dump magic
0.000 0 3 0 non-token data: bayes db version
0.000 0 2110713 0 non-token data: nspam
0.000 0 156758 0 non-token data: nham
0.000 0 1608693 0 non-token data: ntokens
0.000 0 1153323145 0 non-token data: oldest atime
0.000 0 1153446556 0 non-token data: newest atime
0.000 0 1153446557 0 non-token data: last journal sync atime
0.000 0 1153367234 0 non-token data: last expiry atime
0.000 0 43200 0 non-token data: last expire atime delta
0.000 0 1204872 0 non-token data: last expire reduction
count
I have fed a large amount of mail into Bayes:
And I'm quite certain that it was fed correctly.
All of the misses I have checked have hit Bayes_00.
Any ideas why this is happening? I have toyed with the idea of lowering
the bayes_00 score. Anyone care to enlighten me on whether this would be
a bad idea and why?
Methinks you don't have enough mail trained in bayes... take a look at my
numbers for hit count, then see how many spam and ham tokens I have in my
bayes database.
If more training doesn't correct the scoring, you could lower the score
for bayes_00, but mine's untouched.
Regards,
Leigh
Leigh Sharpe
Network Systems Engineer
Pacific Wireless
Ph +61 3 9584 8966
Mob 0408 009 502
email [EMAIL PROTECTED]
web www.pacificwireless.com.au
-Gary