On Mon, 7 Mar 2016, Charles Sprickman wrote:

I’ve been running with some daily training for a little over a week and I’m 
seeing less spam in my inbox.  I’ve seen a few things slip through because 
bayes tipped them below the default score, these were two phishing emails.

Here’s some rule stats for anyone interested:

TOP SPAM RULES FIRED

RANK    RULE NAME                       COUNT %OFRULES %OFMAIL %OFSPAM  %OFHAM

  1     TXREP                           13171     8.47   40.38   91.00   72.91
  2     HTML_MESSAGE                    12714     8.18   38.98   87.85   90.80
  3     DCC_CHECK                       10593     6.81   32.48   73.19   33.78
  4     RDNS_NONE                       10269     6.60   31.48   70.95    5.63
  5     SPF_HELO_PASS                   10070     6.48   30.87   69.58   23.41
  6     URIBL_BLACK                      9711     6.25   29.77   67.10    1.58
  7     BODY_NEWDOMAIN_FMBLA             9550     6.14   29.28   65.98    1.64
  8     FROM_NEWDOMAIN_FMBLA             9483     6.10   29.07   65.52    1.36
  9     BAYES_99                         8486     5.46   26.02   58.63    1.18
 10     BAYES_999                        8141     5.24   24.96   56.25    1.06

TOP HAM RULES FIRED

RANK    RULE NAME                       COUNT %OFRULES %OFMAIL %OFSPAM  %OFHAM

  1     HTML_MESSAGE                    16473     9.13   50.51   87.85   90.80
  2     DKIM_SIGNED                     13776     7.64   42.24   13.81   75.93
  3     TXREP                           13228     7.33   40.56   91.00   72.91
  4     DKIM_VALID                      12962     7.19   39.74   11.93   71.44
  5     RCVD_IN_DNSWL_NONE               9941     5.51   30.48    8.08   54.79
  6     DKIM_VALID_AU                    8711     4.83   26.71    7.99   48.01
  7     BAYES_00                         8390     4.65   25.72    1.84   46.24
  8     RCVD_IN_JMF_W                    7369     4.09   22.59    2.54   40.62
  9     RCVD_IN_MSPIKE_WL                6713     3.72   20.58    4.39   37.00
 10     BAYES_50                         6201     3.44   19.01   25.56   34.18


Based upon your stats it looks like you need more Bayes training. Your Bayes 00/99 hits should rank higher in the rules-fired stats and BAYES_50 shouldn't be in the top-10 at all.
(of course if you've only been training for a week that would explain it).

For example, here's my top-10 hits (for a one month interval).

TOP SPAM RULES FIRED
----------------------------------------------------------------------
RANK    RULE NAME                       COUNT  %OFMAIL %OFSPAM  %OFHAM  S/O
----------------------------------------------------------------------
   1    T__BOTNET_NOTRUST               114907   60.32   86.81   42.66  0.5755
   2    BAYES_99                        109138   32.98   82.45    0.01  0.9998
   3    BAYES_999                       104903   31.70   79.25    0.01  0.9999
   4    HTML_MESSAGE                    90850    79.41   68.63   86.59  0.3456
   5    URIBL_BLACK                     90845    27.61   68.63    0.27  0.9942
   6    T_QUARANTINE_1                  90640    27.40   68.47    0.02  0.9996
   7    URIBL_DBL_SPAM                  79152    24.02   59.79    0.17  0.9956
   8    KAM_VERY_BLACK_DBL              74301    22.45   56.13    0.00  1.0000
   9    L_FROM_SPAMMER1k                73667    22.26   55.65    0.00  1.0000
  10    T__RECEIVED_1                   72413    42.60   54.70   34.54  0.5135

OP HAM RULES FIRED
----------------------------------------------------------------------
RANK    RULE NAME                       COUNT  %OFMAIL %OFSPAM  %OFHAM  S/O
----------------------------------------------------------------------
   1    BAYES_00                        182674   56.03    2.11   91.97  0.0150
   2    HTML_MESSAGE                    171992   79.41   68.63   86.59  0.3456
   3    SPF_PASS                        136623   63.08   54.52   68.78  0.3457
   4    T_RP_MATCHES_RCVD               130879   53.75   35.54   65.89  0.2644
   5    T__RECEIVED_2                   125492   53.76   39.62   63.18  0.2947
   6    DKIM_SIGNED                     114808   38.57    9.72   57.80  0.1008
   7    DKIM_VALID                      105385   34.70    7.16   53.06  0.0825
   8    RCVD_IN_DNSWL_NONE              92951    29.90    4.56   46.80  0.0609
   9    T__BOTNET_NOTRUST               84741    60.32   86.81   42.66  0.5755
  10    KHOP_RCVD_TRUST                 84623    26.44    2.19   42.60  0.0331

Note how highly BAYES 00/99 ranked. What you don't see is that BAYES_50 is way down in the mud (below 50 rank).

BTW, this is with a Bayes that is mostly fed via auto-learning. I occasionally
hand feed corner cases that get mis-classified (usually things like phishes, or conference announcments that can look shakey).


--
Dave Funk                                  University of Iowa
<dbfunk (at) engineering.uiowa.edu>        College of Engineering
319/335-5751   FAX: 319/384-0549           1256 Seamans Center
Sys_admin/Postmaster/cell_admin            Iowa City, IA 52242-1527
#include <std_disclaimer.h>
Better is not better, 'standard' is better. B{

Reply via email to