My SA 3.10 is working very well and now
I am looking to tune some of the rules,
especially those 'acquired' from SARE and
others who publish.

The only tools I am using are sa-stats (and
I believe there are several programs with this
name):

# file: sa-stats.pl
# date: 2005-07-27
# version: 0.9
# author: Dallas Engelken <[EMAIL PROTECTED]>
# desc: SA 3.x log parser

...along with greping for various patterns in
both log messages and in the sa-stats output.*

I can begin modifying this script (e.g., include
scores from the .cf files) or perhaps there are
good suggestions for managing and re-scoring 
(especially add-on) rules?

Example:
grep to find rules that hit EITHER no Spam or
no Ham and decide if they are scoring on the correct
side, and at the correct level:

The following are Ham hits that have " 0.00" patterns:
(from the 2000 most recent spamd log entries):

   9    USER_IN_WHITELIST_TO     202     1.87   10.10    0.000  12.16867
  10    USER_IN_WHITELIST        159     1.47    7.95    0.000   9.57831
  12    DK_SIGNED                154     1.43    7.70    0.000   9.27711
  24    RCVD_IN_BSP_TRUSTED       47     0.43    2.35    0.000   2.83133
  49    Y_GAPPY_DASHES5           18     0.17    0.90    0.000   1.08434
  50    TW_YG                     18     0.17    0.90    0.000   1.08434
  52    FU_QUE_NO_SLASH           18     0.17    0.90    0.000   1.08434
  58    FR_DIV_CLEAR              16     0.15    0.80    0.000   0.96386
  67    HTML_TINY_FONT            15     0.14    0.75    0.000   0.90361
  70    DK_VERIFIED               14     0.13    0.70    0.000   0.84337
  71    TW_DH                     14     0.13    0.70    0.000   0.84337
  73    J_CHICKENPOX_53           14     0.13    0.70    0.000   0.84337
  77    SARE_MSGID_LONG40         13     0.12    0.65    0.000   0.78313
  78    FH_MSGID_HUGE_40          13     0.12    0.65    0.000   0.78313
  79    J_CHICKENPOX_75           13     0.12    0.65    0.000   0.78313
  82    SARE_HTML_HEAD_EMPTY      12     0.11    0.60    0.000   0.72289
  84    FR_HEAD_EMPTY             12     0.11    0.60    0.000   0.72289
  85    FS_OBFU_Q1                12     0.11    0.60    0.000   0.72289
  89    J_CHICKENPOX_19           11     0.10    0.55    0.000   0.66265
  92    Y_BEST_UPPERCASE          11     0.10    0.55    0.000   0.66265
  95    J_CHICKENPOX_43           10     0.09    0.50    0.000   0.60241
  96    HTML_TITLE_SUBJ_DIFF      10     0.09    0.50    0.000   0.60241
 100    RCVD_DOUBLE_IP_LOOSE      10     0.09    0.50    0.000   0.60241

E.g., most of those J_ (chickenpos etc.) rules are scored
positively (0.6)  and this subset is hitting strictly
ham (and no spam).

Reducing (or eliminating) those and considering a boost
for things that score correctly like DK_SIGNED....

All of my mail is (currently) scoring correctly overall
but this seems like a good place to get ahead of false
results....

Who would have thought HTML_TINY_FONT would hit all
HAM for anyone?

Ideas?

--
Herb Martin

Reply via email to