Note, I am not on the SARE list. This message is more directed at the SARE developers and thus that list. It copies the SA users list.
I wrote: >>> This rule is poorly written as it does not limit its examination >>> to the last external relay. LuKreme responded: >> The rule quite specifically does not look at the top received >> header because all the spammers were using US based relays to avoid >> checks like the one you suggested. I believed otherwise and stated as much: > Then that is unfair discrimination, blocking all of a major ISP's > customers' traffic. I suspect the rule instead pre-dates either the > creation of the X-Spam-Relays-External pseudo-header or the author(s)' > familiarity with it. I created some tests for this hypothesis and entered them into my sandbox for masscheck data. Results are in: Spammers do not send mail from HINET zombies in through US based relays. My tests compared two versions of my rule (suffixed 2 and 3) versus the original: http://ruleqa.spamassassin.org/?rule=%2FSARE SPAM% HAM% S/O RANK SCORE NAME 0.4481 0.0019 0.996 0.81 0.01 T_SARE_RECV_SPAM_DOMN0B2 0.4481 0.0019 0.996 0.81 0.01 T_SARE_RECV_SPAM_DOMN0B3 0.4511 0.0045 0.990 0.81 0.01 T_SARE_RECV_SPAM_DOMN0B This proves that the SARE rule is unnecessarily broad, catching a negligible excess in spam and ham. Rules #2 and #3 performed exactly the same, confirming my unvoiced suspicion that the rule was checking against too broad a domain list. The 6 ham my tests hit were already scored by the system between 13-17 points (holy crap!) while 8 hams matching the original test scored 3 or lower and the same 6 hams as my tests(!) hit the 13-17 score range. Looking at spam scoring under 10, my tests missed 12 spams that the original caught (of 34 missed spams overall). Therefore, it is worthwhile to migrate to the more conservative rule (my #3): header SARE_RECV_SPAM_DOMN0B X-Spam-Relays-External =~ /^[^\]]+ rdns=[^ ]{0,25}\bdynamic.hinet\.net / HOWEVER: Perhaps more important to note is the overlap. Here's the data (all versions had identical results), truncated to wrap; second percent is the percent of the other rule's hits that overlap this rule's hits: > overlap spam: 100% of [this] also hit RAZOR2_CF_RANGE_51_100; 0% > overlap spam: 100% of [this] also hit RAZOR2_CF_RANGE_E4_51_100; 0% > overlap spam: 100% of [this] also hit RAZOR2_CHECK; 0% > overlap spam: 100% of [this] also hit RCVD_IN_PBL; 1% > overlap spam: 100% of [this] also hit RDNS_DYNAMIC; 1% RDNS_DYNAMIC is a meta rule triggered by these: > overlap spam: 100% of [this] also hit __RDNS_DYNAMIC_IPADDR; 0% > overlap spam: 100% of [this] also hit __RDNS_INDICATOR_DYN; 10% On SA 3.2.5, that's 0.5 + 1.5 + 0.5 + 0.509 + 0.1 = 3.109 On SA 3.3.0, that's 0.5 + 0.642 + 0.922 + 3.335 + 0.982 = 6.381 (Without network tests, SA-3.2.5 scores that 0.1 while SA-3.3.0 scores it at 1.663 (with bayes on) or 2.639. The above stanza used the more pessimistic sum and would be higher with bayes on SA-3.2.5 and higher without bayes on SA-3.3.0.) Don't forget that 90+% of the hits on svn-trunk had at least four more points than the ones I just added up from the 100% overlap. Now add the original rule's 1.666 points. Even the *minimum* scores of 4.775 and 8.047 are hard to swallow for HINET customers who may not have a choice of vendors. By using an external smarthost, Jidanni was able to bypass all but SARE's 1.666 points. Since my version only examines the last-external relay, it would be bypassed by a clean smarthost too. This should pretty clearly illustrate that the last two versions of spamassassin don't benefit from this rule at all. For those convinced there is merit for this rule on legacy SA versions, I suggest my rewrite as it removes more than half the false positives. The fact that 70_sare_header1.cf is chock-full of rules like this should stand as a good warning to anybody considering any of the SARE channels numbered 1+ for increased risk (as marked when they were still actively maintained!).