On 10/12/2013 9:28 PM, John Hardin wrote:
> On Sat, 12 Oct 2013, Stan Hoeppner wrote:
> 
>> Steve, the one who wrote this regex, would you please explain your
>> reasoning behind giving this rule a score so high as 2.8,
> 
> That score was auto-assigned by masscheck, where it is doing quite well:
> 
> http://ruleqa.spamassassin.org/?rule=FSL_HELO_BARE_IP_2
> 
>> and engage in discussion WRT lowering the score, eliminating the
>> overlap with the other bare IP HELO rules, etc?
> 
> It seems that 94% of the ham hits in masscheck are against list mail,
> and none of the spam hits are, so it would seem reasonable to add an
> exclusion for list messages.

I did some digging and have discovered precisely why FSL_HELO_BARE_IP_2,
RCVD_NUMERIC_HELO, et al falsely hit on much list mail.  It's quite
interesting.  Operators of newsgroups which mirror/archive mailing
lists, and allow posting from a web interface, are adding forged
Received: headers before sending an email to the respective list server.

To create a record apparently in case of abuse, Gmane in particular
injects the rDNS string of the HTTP client machine into the EHLO
position of a Received: header, using the bare IP upon NXDOMAIN or
SERVFAIL.  There is no SMTP transaction between the hosts, only a PHP
form.  I just tested it:

...
Received: from plane.gmane.org (plane.gmane.org [80.91.229.3])
        (using TLSv1 with cipher AES256-SHA (256/256 bits))
        (Client did not present a certificate)
        by bendel.debian.org (Postfix) with ESMTPS id 7DD8CA6
        for <debian-u...@lists.debian.org>; Tue, 15 Oct 2013 07:40:05 +0000 
(UTC)
Received: from list by plane.gmane.org with local (Exim 4.69)
        (envelope-from <gldu-debian-use...@m.gmane.org>)
        id 1VVzEY-0005lJ-P1
        for debian-u...@lists.debian.org; Tue, 15 Oct 2013 09:40:02 +0200
Received: from mo-65-41-216-221.sta.embarqhsd.net ([65.41.216.221])
        by main.gmane.org with esmtp (Gmexim 0.1 (Debian))
        id 1AlnuQ-0007hv-00
        for <debian-u...@lists.debian.org>; Tue, 15 Oct 2013 09:40:02 +0200
Received: from stan by mo-65-41-216-221.sta.embarqhsd.net with local
(Gmexim 0.1 (Debian))
        id 1AlnuQ-0007hv-00
        for <debian-u...@lists.debian.org>; Tue, 15 Oct 2013 09:40:02 +0200
X-Injected-Via-Gmane: http://gmane.org/


An example from my spam folder, host IP returns NXDOMAIN:
...
Received: from plane.gmane.org (plane.gmane.org [80.91.229.3])
        (using TLSv1 with cipher AES256-SHA (256/256 bits))
        (Client did not present a certificate)
        by bendel.debian.org (Postfix) with ESMTPS id BAA4A1F1
        for <debian-u...@lists.debian.org>; Sun, 13 Oct 2013 17:40:46 +0000 
(UTC)
Received: from list by plane.gmane.org with local (Exim 4.69)
        (envelope-from <gldu-debian-use...@m.gmane.org>)
        id 1VVPel-0003yo-CD
        for debian-u...@lists.debian.org; Sun, 13 Oct 2013 19:40:43 +0200
Received: from 94.79.44.98 ([94.79.44.98])
        by main.gmane.org with esmtp (Gmexim 0.1 (Debian))
        id 1AlnuQ-0007hv-00
        for <debian-u...@lists.debian.org>; Sun, 13 Oct 2013 19:40:43 +0200
Received: from freehck by 94.79.44.98 with local (Gmexim 0.1 (Debian))
        id 1AlnuQ-0007hv-00
        for <debian-u...@lists.debian.org>; Sun, 13 Oct 2013 19:40:43 +0200
X-Injected-Via-Gmane: http://gmane.org/


In both cases the last two Received: headers in each message are
forgeries as no SMTP transaction occurred.  I'm sure this violates more
than one SMTP RFC, but I doubt Gmane will change the way they do this
any time soon.


My spam folder goes back to 09/2012, has 3341 msgs.

$ grep -P "FSL_HELO_BARE_IP_2|RCVD_NUMERIC_HELO"
/home/stan/mail/Recent-Spam -c
1188

$ grep "FSL_HELO_BARE_IP_2" /home/stan/mail/Recent-Spam -c
553

$ grep "main.gmane.org" /home/stan/mail/Recent-Spam -c
166

$ grep -B1 "main.gmane.org" /home/stan/mail/Recent-Spam
...
Received: from 209.239.228.34 ([209.239.228.34])
        by main.gmane.org with esmtp (Gmexim 0.1 (Debian))

$ grep -A53 "FSL_HELO_BARE_IP_2" /home/stan/mail/Recent-Spam|grep
"main.gmane.org" -c
106

All 106 are ham.

$ grep -A53 "RCVD_NUMERIC_HELO" /home/stan/mail/Recent-Spam|grep
"main.gmane.org" -c
147

All 147 are ham.

Of the 553 that hi FSL_HELO_BARE_IP_2, some of that may be other such
newsgroup injected mail.  I didn't dig that deep as we have plenty of
data already demonstrating that this test shouldn't be applied to list
mail.  In addition, any tests that target broadband/consumer rDNS
patterns at HELO strings should also be excluded from list mail, given
that the HTTP client rDNS is injected as the HELO string by the likes of
Gmane et al, when rDNS exists.  I assume such tests must exist in SA as
they're fantastic for identifying bot spam.

In fact, I've maintained for a few years now a Postfix PCRE table of
~1650 fully qualified expressions that matches rDNS patterns of consumer
ISPs worldwide.  It evaluates during the SMTP session on client or HELO
rDNS string, REJECTs on a dynamic match or PREPENDs on most generic but
static looking rDNS.  It's much much faster than doing such tests in SA,
and uses far far fewer CPU/memory resources.  And it's one of the
reasons I avoided using SA, or any content filter, for many many years.

-- 
Stan



Reply via email to