On Mon, 7 Mar 2016, Charles Sprickman wrote:
I’ve been running with some daily training for a little over a week and I’m
seeing less spam in my inbox. I’ve seen a few things slip through because
bayes tipped them below the default score, these were two phishing emails.
Here’s some rule stats for anyone interested:
TOP SPAM RULES FIRED
RANK RULE NAME COUNT %OFRULES %OFMAIL %OFSPAM %OFHAM
1 TXREP 13171 8.47 40.38 91.00 72.91
2 HTML_MESSAGE 12714 8.18 38.98 87.85 90.80
3 DCC_CHECK 10593 6.81 32.48 73.19 33.78
4 RDNS_NONE 10269 6.60 31.48 70.95 5.63
5 SPF_HELO_PASS 10070 6.48 30.87 69.58 23.41
6 URIBL_BLACK 9711 6.25 29.77 67.10 1.58
7 BODY_NEWDOMAIN_FMBLA 9550 6.14 29.28 65.98 1.64
8 FROM_NEWDOMAIN_FMBLA 9483 6.10 29.07 65.52 1.36
9 BAYES_99 8486 5.46 26.02 58.63 1.18
10 BAYES_999 8141 5.24 24.96 56.25 1.06
TOP HAM RULES FIRED
RANK RULE NAME COUNT %OFRULES %OFMAIL %OFSPAM %OFHAM
1 HTML_MESSAGE 16473 9.13 50.51 87.85 90.80
2 DKIM_SIGNED 13776 7.64 42.24 13.81 75.93
3 TXREP 13228 7.33 40.56 91.00 72.91
4 DKIM_VALID 12962 7.19 39.74 11.93 71.44
5 RCVD_IN_DNSWL_NONE 9941 5.51 30.48 8.08 54.79
6 DKIM_VALID_AU 8711 4.83 26.71 7.99 48.01
7 BAYES_00 8390 4.65 25.72 1.84 46.24
8 RCVD_IN_JMF_W 7369 4.09 22.59 2.54 40.62
9 RCVD_IN_MSPIKE_WL 6713 3.72 20.58 4.39 37.00
10 BAYES_50 6201 3.44 19.01 25.56 34.18
Based upon your stats it looks like you need more Bayes training.
Your Bayes 00/99 hits should rank higher in the rules-fired stats and BAYES_50
shouldn't be in the top-10 at all.
(of course if you've only been training for a week that would explain it).
For example, here's my top-10 hits (for a one month interval).
TOP SPAM RULES FIRED
----------------------------------------------------------------------
RANK RULE NAME COUNT %OFMAIL %OFSPAM %OFHAM S/O
----------------------------------------------------------------------
1 T__BOTNET_NOTRUST 114907 60.32 86.81 42.66 0.5755
2 BAYES_99 109138 32.98 82.45 0.01 0.9998
3 BAYES_999 104903 31.70 79.25 0.01 0.9999
4 HTML_MESSAGE 90850 79.41 68.63 86.59 0.3456
5 URIBL_BLACK 90845 27.61 68.63 0.27 0.9942
6 T_QUARANTINE_1 90640 27.40 68.47 0.02 0.9996
7 URIBL_DBL_SPAM 79152 24.02 59.79 0.17 0.9956
8 KAM_VERY_BLACK_DBL 74301 22.45 56.13 0.00 1.0000
9 L_FROM_SPAMMER1k 73667 22.26 55.65 0.00 1.0000
10 T__RECEIVED_1 72413 42.60 54.70 34.54 0.5135
OP HAM RULES FIRED
----------------------------------------------------------------------
RANK RULE NAME COUNT %OFMAIL %OFSPAM %OFHAM S/O
----------------------------------------------------------------------
1 BAYES_00 182674 56.03 2.11 91.97 0.0150
2 HTML_MESSAGE 171992 79.41 68.63 86.59 0.3456
3 SPF_PASS 136623 63.08 54.52 68.78 0.3457
4 T_RP_MATCHES_RCVD 130879 53.75 35.54 65.89 0.2644
5 T__RECEIVED_2 125492 53.76 39.62 63.18 0.2947
6 DKIM_SIGNED 114808 38.57 9.72 57.80 0.1008
7 DKIM_VALID 105385 34.70 7.16 53.06 0.0825
8 RCVD_IN_DNSWL_NONE 92951 29.90 4.56 46.80 0.0609
9 T__BOTNET_NOTRUST 84741 60.32 86.81 42.66 0.5755
10 KHOP_RCVD_TRUST 84623 26.44 2.19 42.60 0.0331
Note how highly BAYES 00/99 ranked. What you don't see is that BAYES_50 is way
down in the mud (below 50 rank).
BTW, this is with a Bayes that is mostly fed via auto-learning. I occasionally
hand feed corner cases that get mis-classified (usually things like phishes, or
conference announcments that can look shakey).
--
Dave Funk University of Iowa
<dbfunk (at) engineering.uiowa.edu> College of Engineering
319/335-5751 FAX: 319/384-0549 1256 Seamans Center
Sys_admin/Postmaster/cell_admin Iowa City, IA 52242-1527
#include <std_disclaimer.h>
Better is not better, 'standard' is better. B{