On Fri, 2010-08-20 at 20:54 +0200, Karsten Bräckelmann wrote:
> Because it depends. Some lists are suitable for deep-parsing. Some are
> not.
> 
> 
> Moreover, IMHO you are barking up the wrong tree. In your OP you said, a
> message has been *rejected* by your SMTP. Yet, you are focusing entirely
> on the RCVD_IN_BL_SPAMCOP_NET and RCVD_IN_SORBS_WEB hits. Which by
> itself won't even push the score above the default spam threshold.
> 
> Thus, very vital but left out parts to the puzzle are,  (a) which rules
> triggered in addition to them, and  (b) at what threshold does your SMTP
> reject a message?
> 
> The combined score of these rules is no where even close to a sensible
> rejection limit. Whatever else the message tripped on, it accounts for
> the lions-share.

Just to back up my claim with numbers, here are the scores for both 3.2
and 3.3 branches. Minimally edited for readability.

  $ egrep 'RCVD_IN_(BL_SPAMCOP_NET|SORBS_WEB)' 3.[23]/rules/50_scores.cf

  3.2/rules/50_scores.cf: score RCVD_IN_BL_SPAMCOP_NET 0 2.188 0 1.960
  3.2/rules/50_scores.cf: score RCVD_IN_SORBS_WEB      0 1.117 0 0.619

  3.3/rules/50_scores.cf: score RCVD_IN_BL_SPAMCOP_NET 0 1.246 0 1.347
  3.3/rules/50_scores.cf: score RCVD_IN_SORBS_WEB      0 0.614 0 0.770

As you can see, even the aging 3.2 rule-set with Bayes disabled scores
these at ~3.3 -- the worst possible combination, and yet still some way
to go to cross the spam threshold of 5.0. Enabling Bayes, or using the
latest stable SA release, only increases the buffer to be considered
spammy.

These numbers have been optimized with a spam threshold of 5.0 (at the
time of their creation) -- to *minimize* false classification [1], while
considering FPs more severe than FNs.

With that in mind, a sensible reject limit is 8.0 or even higher [2]. In
which case the remaining hits account for >4.7 -- or in other words,
(almost) would have pushed the message in question over the spam
threshold, even without those RCVD_IN_* hits.


[1] It is impossible to eliminate FPs and FNs at the same time. In your
    OP you mentioned a single rejected message, right? ;)

[2] Based on experience and years of discussion on this list.

-- 
char *t="\10pse\0r\0dtu...@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4";
main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;i<l;i++){ i%8? c<<=1:
(c=*++x); c&128 && (s+=h); if (!(h>>=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}

Reply via email to