Note, I am not on the SARE list.  This message is more directed at the
SARE developers and thus that list.  It copies the SA users list.

I wrote:
>>> This rule is poorly written as it does not limit its examination
>>> to the last external relay.

LuKreme responded:
>> The rule quite specifically does not look at the top received
>> header because all the spammers were using US based relays to avoid
>> checks like the one you suggested.

I believed otherwise and stated as much:
> Then that is unfair discrimination, blocking all of a major ISP's
> customers' traffic.  I suspect the rule instead pre-dates either the
> creation of the X-Spam-Relays-External pseudo-header or the author(s)'
> familiarity with it.

I created some tests for this hypothesis and entered them into my
sandbox for masscheck data.  Results are in:  Spammers do not send
mail from HINET zombies in through US based relays.

My tests compared two versions of my rule (suffixed 2 and 3) versus
the original:  http://ruleqa.spamassassin.org/?rule=%2FSARE

   SPAM%     HAM%     S/O    RANK   SCORE  NAME
  0.4481   0.0019   0.996    0.81    0.01  T_SARE_RECV_SPAM_DOMN0B2
  0.4481   0.0019   0.996    0.81    0.01  T_SARE_RECV_SPAM_DOMN0B3
  0.4511   0.0045   0.990    0.81    0.01  T_SARE_RECV_SPAM_DOMN0B

This proves that the SARE rule is unnecessarily broad, catching a
negligible excess in spam and ham.  Rules #2 and #3 performed exactly
the same, confirming my unvoiced suspicion that the rule was checking
against too broad a domain list.

The 6 ham my tests hit were already scored by the system between 13-17
points (holy crap!) while 8 hams matching the original test scored 3
or lower and the same 6 hams as my tests(!) hit the 13-17 score range.
 Looking at spam scoring under 10, my tests missed 12 spams that the
original caught (of 34 missed spams overall).

Therefore, it is worthwhile to migrate to the more conservative rule
(my #3):

header SARE_RECV_SPAM_DOMN0B   X-Spam-Relays-External =~
/^[^\]]+ rdns=[^ ]{0,25}\bdynamic.hinet\.net /


HOWEVER:

Perhaps more important to note is the overlap.  Here's the data (all
versions had identical results), truncated to wrap; second percent is
the percent of the other rule's hits that overlap this rule's hits:
> overlap spam: 100% of [this] also hit RAZOR2_CF_RANGE_51_100; 0%
> overlap spam: 100% of [this] also hit RAZOR2_CF_RANGE_E4_51_100; 0%
> overlap spam: 100% of [this] also hit RAZOR2_CHECK; 0%
> overlap spam: 100% of [this] also hit RCVD_IN_PBL; 1%
> overlap spam: 100% of [this] also hit RDNS_DYNAMIC; 1%

RDNS_DYNAMIC is a meta rule triggered by these:
> overlap spam: 100% of [this] also hit __RDNS_DYNAMIC_IPADDR; 0%
> overlap spam: 100% of [this] also hit __RDNS_INDICATOR_DYN; 10%

On SA 3.2.5, that's 0.5 + 1.5 + 0.5 + 0.509 + 0.1 = 3.109
On SA 3.3.0, that's 0.5 + 0.642 + 0.922 + 3.335 + 0.982 = 6.381

(Without network tests, SA-3.2.5 scores that 0.1 while SA-3.3.0 scores
it at 1.663 (with bayes on) or 2.639.  The above stanza used the more
pessimistic sum and would be higher with bayes on SA-3.2.5 and higher
without bayes on SA-3.3.0.)

Don't forget that 90+% of the hits on svn-trunk had at least four more
points than the ones I just added up from the 100% overlap.

Now add the original rule's 1.666 points.  Even the *minimum* scores
of 4.775 and 8.047 are hard to swallow for HINET customers who may not
have a choice of vendors.  By using an external smarthost, Jidanni was
able to bypass all but SARE's 1.666 points.  Since my version only
examines the last-external relay, it would be bypassed by a clean
smarthost too.

This should pretty clearly illustrate that the last two versions of
spamassassin don't benefit from this rule at all.  For those convinced
there is merit for this rule on legacy SA versions, I suggest my
rewrite as it removes more than half the false positives.

The fact that 70_sare_header1.cf is chock-full of rules like this
should stand as a good warning to anybody considering any of the SARE
channels numbered 1+ for increased risk (as marked when they were
still actively maintained!).

Reply via email to