Bowie Bailey wrote: > Since the sought rules have been updating for a while now, I took a > look at my stats to see how they were doing. They used to be one > of my most useful rules, but recently, they don't seem to be doing > so good. > > Here are the stats for the last month:
That looks like the sare stats script (modified to show all rules as evidenced by rank 261). It doesn't account for FPs or FNs. I reformatted your output so it wraps well for email. > TOP SPAM RULES FIRED > ------------------------------------------------------------ > RANK RULE NAME COUNT %OFRULES %OFMAIL %OFSPAM %OFHAM > ------------------------------------------------------------ > 111 JM_SOUGHT_FRAUD_3 112 0.06 0.36 0.97 0.01 > 154 JM_SOUGHT_2 53 0.03 0.17 0.46 0.16 > 214 JM_SOUGHT_3 31 0.02 0.10 0.27 0.51 > 253 JM_SOUGHT_1 21 0.01 0.07 0.18 0.01 > 261 JM_SOUGHT_FRAUD_2 19 0.01 0.06 0.17 0.01 > ------------------------------------------------------------ > > TOP HAM RULES FIRED > ------------------------------------------------------------ > RANK RULE NAME COUNT %OFRULES %OFMAIL %OFSPAM %OFHAM > ------------------------------------------------------------ > 85 JM_SOUGHT_3 99 0.08 0.32 0.27 0.51 > 161 JM_SOUGHT_2 30 0.03 0.10 0.46 0.16 > 351 JM_SOUGHT_FRAUD_3 2 0.00 0.01 0.97 0.01 > 365 JM_SOUGHT_FRAUD_2 2 0.00 0.01 0.17 0.01 > 378 JM_SOUGHT_1 1 0.00 0.00 0.18 0.01 > ------------------------------------------------------------ That is quite different from our masscheck stats. Today's results at http://ruleqa.spamassassin.org/20100201/%2FJM_SOUGHT look like this: SPAM% HAM% S/O RANK SCORE NAME 9.8564 0.0042 1.000 0.94 0.01 T_JM_SOUGHT_3 8.1587 0.0068 0.999 0.93 0.01 T_JM_SOUGHT_2 11.6464 0.0289 0.998 0.89 0.01 T_JM_SOUGHT_1 0 0 0.500 0.48 0.00 JM_SOUGHT_FRAUD_1 0 0 0.500 0.48 0.00 JM_SOUGHT_FRAUD_2 0 0 0.500 0.48 0.00 JM_SOUGHT_FRAUD_3 Here are my own numbers, as observed by a custom script which recalculates results based on re-scoring specific rules. "Rejected" requires a score of 8.0 and "flagged" requires 5.0. (It only examines three rules at a time, and we got 33 messages between my runs.) JM_SOUGHT_1 ( 0.3% of 34831 total) with score-bump of -4: 124 rejected 1 flagged, with 0 (0%) that would have been rejected 1 passed, with -1 (-0.0%) that would have been flagged JM_SOUGHT_2 ( 0.2% of 34831 total) with score-bump of -4: 47 rejected 8 flagged, with -2 (-0.1%) that would have been rejected 24 passed, with -8 (-0.0%) that would have been flagged JM_SOUGHT_3 ( 0.5% of 34831 total) with score-bump of -4: 121 rejected 10 flagged, with -3 (-0.1%) that would have been rejected 60 passed, with -10 (-0.0%) that would have been flagged JM_SOUGHT_FRAUD_1 ( 0.0% of 34864 total) with score-bump of -3: 34 rejected 0 flagged, with 0 (0%) that would have been rejected 0 passed, with 0 (0%) that would have been flagged JM_SOUGHT_FRAUD_2 ( 0.5% of 34864 total) with score-bump of -3: 203 rejected 0 flagged, with 0 (0%) that would have been rejected 0 passed, with 0 (0%) that would have been flagged JM_SOUGHT_FRAUD_3 ( 1.3% of 34864 total) with score-bump of -3: 486 rejected 0 flagged, with -4 (-0.2%) that would have been rejected 1 passed, with 0 (0%) that would have been flagged My script was mostly written for adding points rather than subtracting them, so the notation is a little less intuitive. For example, rule 2 moved two mails from flag to reject and caused eight mails to get flagged. Recall that unlike the masscheck (which is hand-verified), log parsers like the sare script and my own script have no knowledge of FPs or FNs. I bet most if not all of the 86 messages that the SOUGHT rules noticed but didn't push up to the 5.0 mark were probably FNs. Of course, the reason I have a flag threshold and a reject threshold is so that I can still deliver low-scoring FPs. My users get them flagged as spam, with SA's spaminess score in the subject. That means instead of risking a loss of 86 messages, I only risked losing 9, and thanks to the smtp-time reject nature of my implementation, the senders got notices of the deliver failure. I have not yet had a complaint of these rejections based on SOUGHT rules. (The complaints are rare enough and usually related to massive misconfigurations on the sending relay.)