My SA 3.10 is working very well and now I am looking to tune some of the rules, especially those 'acquired' from SARE and others who publish.
The only tools I am using are sa-stats (and I believe there are several programs with this name): # file: sa-stats.pl # date: 2005-07-27 # version: 0.9 # author: Dallas Engelken <[EMAIL PROTECTED]> # desc: SA 3.x log parser ...along with greping for various patterns in both log messages and in the sa-stats output.* I can begin modifying this script (e.g., include scores from the .cf files) or perhaps there are good suggestions for managing and re-scoring (especially add-on) rules? Example: grep to find rules that hit EITHER no Spam or no Ham and decide if they are scoring on the correct side, and at the correct level: The following are Ham hits that have " 0.00" patterns: (from the 2000 most recent spamd log entries): 9 USER_IN_WHITELIST_TO 202 1.87 10.10 0.000 12.16867 10 USER_IN_WHITELIST 159 1.47 7.95 0.000 9.57831 12 DK_SIGNED 154 1.43 7.70 0.000 9.27711 24 RCVD_IN_BSP_TRUSTED 47 0.43 2.35 0.000 2.83133 49 Y_GAPPY_DASHES5 18 0.17 0.90 0.000 1.08434 50 TW_YG 18 0.17 0.90 0.000 1.08434 52 FU_QUE_NO_SLASH 18 0.17 0.90 0.000 1.08434 58 FR_DIV_CLEAR 16 0.15 0.80 0.000 0.96386 67 HTML_TINY_FONT 15 0.14 0.75 0.000 0.90361 70 DK_VERIFIED 14 0.13 0.70 0.000 0.84337 71 TW_DH 14 0.13 0.70 0.000 0.84337 73 J_CHICKENPOX_53 14 0.13 0.70 0.000 0.84337 77 SARE_MSGID_LONG40 13 0.12 0.65 0.000 0.78313 78 FH_MSGID_HUGE_40 13 0.12 0.65 0.000 0.78313 79 J_CHICKENPOX_75 13 0.12 0.65 0.000 0.78313 82 SARE_HTML_HEAD_EMPTY 12 0.11 0.60 0.000 0.72289 84 FR_HEAD_EMPTY 12 0.11 0.60 0.000 0.72289 85 FS_OBFU_Q1 12 0.11 0.60 0.000 0.72289 89 J_CHICKENPOX_19 11 0.10 0.55 0.000 0.66265 92 Y_BEST_UPPERCASE 11 0.10 0.55 0.000 0.66265 95 J_CHICKENPOX_43 10 0.09 0.50 0.000 0.60241 96 HTML_TITLE_SUBJ_DIFF 10 0.09 0.50 0.000 0.60241 100 RCVD_DOUBLE_IP_LOOSE 10 0.09 0.50 0.000 0.60241 E.g., most of those J_ (chickenpos etc.) rules are scored positively (0.6) and this subset is hitting strictly ham (and no spam). Reducing (or eliminating) those and considering a boost for things that score correctly like DK_SIGNED.... All of my mail is (currently) scoring correctly overall but this seems like a good place to get ahead of false results.... Who would have thought HTML_TINY_FONT would hit all HAM for anyone? Ideas? -- Herb Martin