On 09/24, David Bennett wrote:
> It occurred to me that a sender that is paying their way into my inbox
> is almost certainly sending me junk mail.   A little research in my
> inbox and it turns out to be right on the money.  All stuff that I
> didn't want. 

I'm very curious what exactly your statistics looked like.  I'll point you
to the spamassassin Rule QA stats that are publicly available:

> # commercial buy-in whitelists (most likely junk)
> score RCVD_IN_BSP_TRUSTED 0.500
> score RCVD_IN_BSP_OTHER 0.500
> score RCVD_IN_BONDEDSENDER 0.500
> score HABEAS_ACCREDITED_COI 0 0.5 0 0.5
> score HABEAS_ACCREDITED_SOI 0 0.25 0 0.25
> score HABEAS_CHECKED 0 0.1 0 0.1

I don't see any of the above in the current spamassassin rules.  What
version of spamassassin are you running?  Anything before 3.3.0 is very
much not recommended.

Ah yes, all but RCVD_IN_BONDEDSENDER were replaced with
RCVD_IN_RP_CERTIFIED in version 3.3.0:
https://issues.apache.org/SpamAssassin/show_bug.cgi?id=6247

And it looks like RCVD_IN_BONDEDSENDER was replaced by RCVD_IN_BSP_OTHER
and RCVD_IN_BSP_TRUSTED some time over four years ago:
https://issues.apache.org/SpamAssassin/show_bug.cgi?id=5476

I'm guessing you're not actually getting hits on any of these six, and just
added them based on an article that hasn't been updated in four years?

> score RCVD_IN_IADB_VOUCHED 0 0.2 0 0.2
> score RCVD_IN_IADB_DOPTIN 0 0.4 0 0.4
> score RCVD_IN_IADB_ML_DOPTIN 0 0.6 0 0.6

http://ruleqa.spamassassin.org/?daterev=20110924-r1175130-n&rule=%2FRCVD_IN_IADB
  MSECS    SPAM%     HAM%     S/O    RANK   SCORE  NAME   WHO/AGE
      0        0   0.0117   0.000    0.46    0.00  RCVD_IN_IADB_VOUCHED  
      0        0   0.7806   0.000    0.66    0.00  RCVD_IN_IADB_DOPTIN  
      0        0        0   0.500    0.45    0.00  RCVD_IN_IADB_ML_DOPTIN  

Hit ZERO out of 362,124 spams.  Also hit a pretty insignificant amount
of ham (non-spam).  

> score RCVD_IN_DNSWL_LOW 0 0.1 0 0.1
> score RCVD_IN_DNSWL_MED 0 0.4 0 0.4
> score RCVD_IN_DNSWL_HI 0 0.8 0 0.8

http://ruleqa.spamassassin.org/?daterev=20110924-r1175130-n&rule=%2FDNSWL
  MSECS    SPAM%     HAM%     S/O    RANK   SCORE  NAME   WHO/AGE
      0   0.0003   1.8893   0.000    0.75    0.00  RCVD_IN_DNSWL_HI  
      0   0.0224  25.6371   0.001    0.86    0.00  RCVD_IN_DNSWL_MED  
      0   0.0376  12.0356   0.003    0.79    0.00  RCVD_IN_DNSWL_LOW  
      0   0.2090  21.8867   0.009    0.66    0.00  RCVD_IN_DNSWL_NONE  

25.6% of ham hits RCVD_IN_DNSWL_MED.  So you're adding a score of 0.4
to a quarter of your ham, when that rule is only hitting 0.02% of spam
(81 out of 362,124 spams).  And that's just one of the three dnswl rules
you're scoring as bad.

I have pretty graphs of dnswl stats over time here:
http://www.chaosreigns.com/dnswl/
(Chrome renders that badly, firefox renders it well, the
non-standardization pains me.)
The two at the bottom are spam vs. ham numbers in the mass-check corpora,
not specific to dnswl.


I assure you, if there were a test that was causing spam to get through,
that wasn't still worth running because a vastly overwhelming majority
of the emails it hit were ham (theoretically reducing false positives,
which is more important than missing a few spams), spamassassin developers
would be very interested to hear about it, and remove it.

If you have that kind of information, please do provide it.

-- 
"If you are not paranoid... you may not be paying attention."
 - j...@creative-net.net, on an IDPA mailing list
http://www.ChaosReigns.com

Reply via email to