Re: Sought rules not doing so good

Adam Katz Tue, 02 Feb 2010 09:07:47 -0800

Bowie Bailey wrote:
> Since the sought rules have been updating for a while now, I took a
> look at my stats to see how they were doing.  They used to be one
> of my most useful rules, but recently, they don't seem to be doing
> so good.
> 
> Here are the stats for the last month:


That looks like the sare stats script (modified to show all rules as
evidenced by rank 261).  It doesn't account for FPs or FNs.  I
reformatted your output so it wraps well for email.

> TOP SPAM RULES FIRED
> ------------------------------------------------------------
> RANK    RULE NAME           COUNT %OFRULES %OFMAIL %OFSPAM %OFHAM
> ------------------------------------------------------------
>  111    JM_SOUGHT_FRAUD_3     112    0.06    0.36   0.97    0.01
>  154    JM_SOUGHT_2            53    0.03    0.17   0.46    0.16
>  214    JM_SOUGHT_3            31    0.02    0.10   0.27    0.51
>  253    JM_SOUGHT_1            21    0.01    0.07   0.18    0.01
>  261    JM_SOUGHT_FRAUD_2      19    0.01    0.06   0.17    0.01
> ------------------------------------------------------------
> 
> TOP HAM RULES FIRED
> ------------------------------------------------------------
> RANK    RULE NAME           COUNT %OFRULES %OFMAIL %OFSPAM %OFHAM
> ------------------------------------------------------------
>   85    JM_SOUGHT_3           99     0.08    0.32   0.27    0.51
>  161    JM_SOUGHT_2           30     0.03    0.10   0.46    0.16
>  351    JM_SOUGHT_FRAUD_3      2     0.00    0.01   0.97    0.01
>  365    JM_SOUGHT_FRAUD_2      2     0.00    0.01   0.17    0.01
>  378    JM_SOUGHT_1            1     0.00    0.00   0.18    0.01
> ------------------------------------------------------------

That is quite different from our masscheck stats.  Today's results at
http://ruleqa.spamassassin.org/20100201/%2FJM_SOUGHT look like this:

   SPAM%     HAM%     S/O    RANK   SCORE  NAME
  9.8564   0.0042   1.000    0.94    0.01  T_JM_SOUGHT_3
  8.1587   0.0068   0.999    0.93    0.01  T_JM_SOUGHT_2
 11.6464   0.0289   0.998    0.89    0.01  T_JM_SOUGHT_1
       0        0   0.500    0.48    0.00  JM_SOUGHT_FRAUD_1
       0        0   0.500    0.48    0.00  JM_SOUGHT_FRAUD_2
       0        0   0.500    0.48    0.00  JM_SOUGHT_FRAUD_3


Here are my own numbers, as observed by a custom script which
recalculates results based on re-scoring specific rules.  "Rejected"
requires a score of 8.0 and "flagged" requires 5.0.  (It only examines
three rules at a time, and we got 33 messages between my runs.)

JM_SOUGHT_1 ( 0.3% of 34831 total) with score-bump of -4:
  124 rejected
  1 flagged, with 0 (0%) that would have been rejected
  1 passed, with -1 (-0.0%) that would have been flagged
JM_SOUGHT_2 ( 0.2% of 34831 total) with score-bump of -4:
  47 rejected
  8 flagged, with -2 (-0.1%) that would have been rejected
  24 passed, with -8 (-0.0%) that would have been flagged
JM_SOUGHT_3 ( 0.5% of 34831 total) with score-bump of -4:
  121 rejected
  10 flagged, with -3 (-0.1%) that would have been rejected
  60 passed, with -10 (-0.0%) that would have been flagged
JM_SOUGHT_FRAUD_1 ( 0.0% of 34864 total) with score-bump of -3:
  34 rejected
  0 flagged, with 0 (0%) that would have been rejected
  0 passed, with 0 (0%) that would have been flagged
JM_SOUGHT_FRAUD_2 ( 0.5% of 34864 total) with score-bump of -3:
  203 rejected
  0 flagged, with 0 (0%) that would have been rejected
  0 passed, with 0 (0%) that would have been flagged
JM_SOUGHT_FRAUD_3 ( 1.3% of 34864 total) with score-bump of -3:
  486 rejected
  0 flagged, with -4 (-0.2%) that would have been rejected
  1 passed, with 0 (0%) that would have been flagged

My script was mostly written for adding points rather than subtracting
them, so the notation is a little less intuitive.  For example, rule 2
moved two mails from flag to reject and caused eight mails to get flagged.

Recall that unlike the masscheck (which is hand-verified), log parsers
like the sare script and my own script have no knowledge of FPs or
FNs.  I bet most if not all of the 86 messages that the SOUGHT rules
noticed but didn't push up to the 5.0 mark were probably FNs.

Of course, the reason I have a flag threshold and a reject threshold
is so that I can still deliver low-scoring FPs.  My users get them
flagged as spam, with SA's spaminess score in the subject.  That means
instead of risking a loss of 86 messages, I only risked losing 9, and
thanks to the smtp-time reject nature of my implementation, the
senders got notices of the deliver failure.  I have not yet had a
complaint of these rejections based on SOUGHT rules.  (The complaints
are rare enough and usually related to massive misconfigurations on
the sending relay.)

Re: Sought rules not doing so good

Reply via email to