So, why did I have to write this patch? 1) I use MAPS DUL and relays.osirusoft.com, which also has a DUL section. The problem is that I had machines that were being penalized twice for being on more than one DUL. This actually let to some non SPAM being reported as spam more than once (and my threshold is 7, not even 5)
2) Furthermore, we shouldn't overly penalize people because they're sending mail from a dialup IP if they properly relayed through their ISP 3) There was no support for querying multi RBL zones like relays.osirusoft.com Let's say an IP is flagged with a score of 2.0 as an open relay in orbs. Because there is already a match for set relay, osirusoft checks won't run against it, even if you had a match (2.0) plus a return code of # 127.0.0.6 which would have given you another 3.0, so you lose a perfect 5.0 score. 4) Probably a few other problems of that sort Over the last 7-10 days, I tried different ways to fix this, some being rather misguided, trying not to run tests if other ones ran, and having overrides to ignore the first IP for dul, which gets interesting if you compare checks in set dialup and checks in set relay which can return a match of dialup. Needless to say, this went nowhere, I couldn't understand my own code before long. The next idea, to change the score of some rules to 0 if other ones already matched seemed misguided too, especially since I wasn't sure if it wouldn't cause problems with spamd I eventually came up with this: adding rules to counter other ones, and using a function called check_two_rbl_results to add a negative score if two RBLs matched on the same thing. Putting osirusoft in set relay was also a mistake, I've put it in its own osirusoft set since it can have many different meanings. Last, but not least, an RBL rule that ends with -firsthop is magic, it only matches on the originating IP provided there is a relay in the middle The rest should make sense if you look at the diff and the example RBL rules in the docs As mentionned in my previous message, this doesn't work right with SA 2.21 unless you disable the new sorting code since it screws up dependencies Comments welcome Marc diff -urN SpamAssassin.orig/Conf.pm SpamAssassin/Conf.pm --- SpamAssassin.orig/Conf.pm Sat Apr 20 11:30:22 2002 +++ SpamAssassin/Conf.pm Mon May 27 18:05:42 2002 @@ -99,6 +99,16 @@ $self->{terse_report_template} = ''; $self->{spamtrap_template} = ''; + # What different RBLs consider a dialup IP -- Marc + $self->{dialup_codes} = { + "dialups.mail-abuse.org." => "127.0.0.3", + # For DUL + other codes, we ignore that it's on DUL + "rbl-plus.mail-abuse.org." => "127.0.0.2", + "relays.osirusoft.com." => "127.0.0.3", + }; + + $self->{num_check_received} = 2; + $self->{razor_config} = $main->sed_path ("~/razor.conf"); # this will be sedded by whitelist implementations, so ~ is OK @@ -553,10 +563,100 @@ $self->{spamtrap_template} = ''; next; } +=item num_check_received { integer } (default: 2) + +How many received lines from and including the original mail relay +do we check in RBLs (you'd want at least 1 or 2). +Note that for checking against dialup lists, you can call check_rbl +with a special set name of "set-firsthop" and this rule will only +be matched against the first hop if there is more than one hop, so +that you can set a negative score to not penalize people who properly +relayed through their ISP. +See dialup_codes for more details and an example + +=cut + + if (/^num[-_]check[-_]received\s+(\d+)$/) { + $self->{num_check_received} = $1+0; next; + } + ########################################################################### # SECURITY: no eval'd code should be loaded before this line. # if ($scoresonly && !$self->{allow_user_rules}) { goto failed_line; } + + +# If you think, this is complex, you should have seen the four previous +# implementations that I scratched :-) +# Once you understand this, you'll see it's actually quite flexible -- Marc + +=item dialup_codes { "domain1" => "127.0.x.y", "domain2" => "127.0.a.b" } + +Default: +{ "dialups.mail-abuse.org." => "127.0.0.3", +# For DUL + other codes, we ignore that it's on DUL + "rbl-plus.mail-abuse.org." => "127.0.0.2", + "relays.osirusoft.com." => "127.0.0.3" }; + +WARNING!!! When passing a reference to a hash, you need to put the whole hash in +one line for the parser to read it correctly (you can check with spamassassin -D +< mesg) + +Set this to what your RBLs return for dialup IPs +It is used by dialup-firsthop and relay-firsthop rules so that you can match +DUL codes and compensate DUL checks with a negative score if the IP is a dialup +IP the mail originated from and it was properly relayed by a hop before reaching +you (hopefully not your secondary MX :-D) +The trailing "-firsthop" is magic, it's what triggers the RBL to only be run +on the originating hop +The idea is to not penalize (or penalize less) people who properly relayed +through their ISP's mail server + +Here's an example showing the use of Osirusoft and MAPS DUL, as well as the use +of check_two_rbl_results to compensate for a match in both RBLs + +header RCVD_IN_DUL eval:check_rbl('dialup', 'dialups.mail-abuse.org.') +describe RCVD_IN_DUL Received from dialup, see +http://www.mail-abuse.org/dul/ +score RCVD_IN_DUL 4 + +header X_RCVD_IN_DUL_FH eval:check_rbl('dialup-firsthop', +'dialups.mail-abuse.org.') +describe X_RCVD_IN_DUL_FH Received from first hop dialup, see +http://www.mail-abuse.org/dul/ +score X_RCVD_IN_DUL_FH -3 + +header RCVD_IN_OSIRUSOFT_COM eval:check_rbl('osirusoft', 'relays.osirusoft.com.') +describe RCVD_IN_OSIRUSOFT_COM Received via an IP flagged in relays.osirusoft.com + +header X_OSIRU_SPAM_SRC eval:check_rbl_results_for('osirusoft', '127.0.0.4') +describe X_OSIRU_SPAM_SRC DNSBL: sender is Confirmed Spam Source, penalizing +further +score X_OSIRU_SPAM_SRC 3.0 + +header X_OSIRU_SPAMWARE_SITE eval:check_rbl_results_for('osirusoft', '127.0.0.6') +describe X_OSIRU_SPAMWARE_SITE DNSBL: sender is a Spamware site or vendor, +penalizing further +score X_OSIRU_SPAMWARE_SITE 3.0 + +header X_OSIRU_DUL_FH eval:check_rbl('osirusoft-dul-firsthop', +'relays.osirusoft.com.') +describe X_OSIRU_DUL_FH Received from first hop dialup listed in +relays.osirusoft.com +score X_OSIRU_DUL_FH -1.5 + +header Z_FUDGE_DUL_MAPS_OSIRU eval:check_two_rbl_results('osirusoft', "127.0.0.3", +'dialup', "127.0.0.3") +describe Z_FUDGE_DUL_MAPS_OSIRU Do not double penalize for MAPS DUL and +Osirusoft DUL +score Z_FUDGE_DUL_MAPS_OSIRU -2 + +header Z_FUDGE_RELAY_OSIRU eval:check_two_rbl_results('osirusoft', "127.0.0.2", +'relay', "127.0.0.2") +describe Z_FUDGE_RELAY_OSIRU Do not double penalize for being an open relay on +Osirusoft and another DNSBL +score Z_FUDGE_RELAY_OSIRU -2 + +header Z_FUDGE_DUL_OSIRU_FH eval:check_two_rbl_results('osirusoft-dul-firsthop', +"127.0.0.3", 'dialup-firsthop', "127.0.0.3") +describe Z_FUDGE_DUL_OSIRU_FH Do not double compensate for MAPS DUL and Osirusoft +DUL first hop dialup +score Z_FUDGE_DUL_OSIRU_FH 1.5 + +=cut + + if (/^dialup_codes\s+(.*)$/) { + $self->{dialup_codes} = eval $1; + next; + } + =back diff -urN SpamAssassin.orig/Dns.pm SpamAssassin/Dns.pm --- SpamAssassin.orig/Dns.pm Mon May 27 15:56:56 2002 +++ SpamAssassin/Dns.pm Mon May 27 17:15:33 2002 @@ -53,28 +53,53 @@ ########################################################################### sub do_rbl_lookup { - my ($self, $set, $dom, $ip, $found) = @_; + my ($self, $set, $dom, $ip, $found, $dialupreturn) = @_; return $found if $found; + my $gotdialup=0; + my $domainonly; + ($domainonly = $dom) =~ s/^\d+\.\d+\.\d+\.\d+.//; + $domainonly =~ s/\.?$/./; + my $q = $self->{res}->search ($dom); if ($q) { foreach my $rr ($q->answer) { if ($rr->type eq "A") { my $addr = $rr->address(); - dbg ("record found for $dom = $addr"); - if ($addr ne '127.0.0.2' && $addr ne '127.0.0.3') { - $self->test_log ("RBL check: found ".$dom.", type: ".$addr); - } else { - # 127.0.0.2 is the traditional boolean indicator, don't log it - # 127.0.0.3 now also means "is a dialup IP" - $self->test_log ("RBL check: found ".$dom); - } + # We might as well log the time for all the RBLs, 127.0.0.2 and .3 do + # have significant meanings in some RBLs and just mean yes in others + # -- Marc + $self->test_log ("RBL check: found ".$dom.", type: ".$addr); + dbg("RBL check: found $dom, type: $addr"); $self->{$set}->{rbl_IN_As_found} .= $addr.' '; $self->{$set}->{rbl_matches_found} .= $ip.' '; - return ($found+1); + + # If $dialupreturn is a reference to a hash, we were told to ignore + # dialup IPs, let's see if we have a match + if ($dialupreturn) { + my $toign; + dbg("Checking dialup_codes for $addr as a DUL code for $domainonly"); + + foreach $toign (keys %{$dialupreturn}) { + dbg("Comparing against $toign/".$dialupreturn->{$toign}); + $toign =~ s/\.?$/./; + if ($domainonly eq $toign and $addr eq $dialupreturn->{$toign}) { + dbg("Got $addr in $toign for $ip, good, we'll take it"); + $gotdialup=1; + last; + } + } + + if (not $gotdialup) { + dbg("Ignoring return $addr for $ip, not known as dialup for $domainonly in +dialup_code variable"); + next; + } + } + + return 1; } } } diff -urN SpamAssassin.orig/EvalTests.pm SpamAssassin/EvalTests.pm --- SpamAssassin.orig/EvalTests.pm Thu Apr 11 09:32:35 2002 +++ SpamAssassin/EvalTests.pm Mon May 27 18:09:33 2002 @@ -424,6 +424,9 @@ sub check_rbl { my ($self, $set, $rbl_domain) = @_; local ($_); + # How many IPs max you check in the received lines; + my $checklast=$self->{conf}->{num_check_received} - 1; + dbg ("checking RBL $rbl_domain, set $set"); my $rcv = $self->get ('Received'); @@ -435,9 +438,11 @@ return 0 unless $self->is_dns_available(); $self->load_resolver(); + #dbg("Got the following IPs: ".join(", ", @ips)); if ($#ips > 1) { - @ips = @ips[$#ips-1 .. $#ips]; # only check the originating 2 + @ips = @ips[$#ips-$checklast .. $#ips]; # only check the originating IPs } + #dbg("But only inspecting the following IPs: ".join(", ", @ips)); if (!defined $self->{$set}->{rbl_IN_As_found}) { $self->{$set}->{rbl_IN_As_found} = ' '; @@ -448,17 +453,54 @@ my $already_matched_in_other_zones = ' '.$self->{$set}->{rbl_matches_found}.' '; my $found = 0; - # First check that DNS is available, if not do not perform this check. + # First check that DNS is available. If not, do not perform this check. # Stop after the first positive. eval { + my $i=0; + my ($b1,$b2,$b3,$b4); + my $dialupreturn; foreach my $ip (@ips) { + $i++; next if ($ip =~ /${IP_IN_RESERVED_RANGE}/o); - next if ($already_matched_in_other_zones =~ / ${ip} /); + # Some of the matches in other zones, like a DUL match on a first hop + # may be negated by another rule, so preventing a match in two zones + # is better done with a Z_FUDGE_foo rule that users check_both_rbl_results + # and sets a negative score to compensate + # It's also useful to be able to flag mail that went through an IP that + # is on two different blacklists -- Marc + #next if ($already_matched_in_other_zones =~ / ${ip} /); + if ($already_matched_in_other_zones =~ / ${ip} /) { + dbg("Skipping $ip, already matched in other zones for $set"); + next; + } next unless ($ip =~ /(\d+)\.(\d+)\.(\d+)\.(\d+)/); - $found = $self->do_rbl_lookup ($set, "$4.$3.$2.$1.".$rbl_domain, $ip, $found); + ($b1, $b2, $b3, $b4) = ($1, $2, $3, $4); + + # By default, we accept any return on an RBL + undef $dialupreturn; + + # foo-firsthop are special rule names that only match on the + # first Received line (used to give a negative score to counter the + # normal dialup rule and not penalize people who relayed through their + # ISP) -- Marc + # By default this rule won't get run unless it's the first hop IP + if ($set =~ /-firsthop$/) { + if ($#ips>0 and $i == $#ips + 1) { + dbg("Set dialupreturn on $ip for first hop"); + $dialupreturn=$self->{conf}->{dialup_codes}; + die "$self->{conf}->{dialup_codes} undef" if (!defined $dialupreturn); + } else { + dbg("Not running firsthop rule against middle hop or direct dialup IP +connection (ip $ip)"); + next; + } + } + + $found = $self->do_rbl_lookup ($set, "$b4.$b3.$b2.$b1.".$rbl_domain, $ip, +$found, $dialupreturn); + #dbg("Got $found on $ip (item $i)"); } }; + #dbg("Check_rbl returning $found"); $found; } @@ -478,6 +520,26 @@ return 0; } + +########################################################################### + +sub check_two_rbl_results { + my ($self, $set1, $addr1, $set2, $addr2) = @_; + + return 0 if $self->{conf}->{skip_rbl_checks}; + return 0 unless $self->is_dns_available(); + return 0 unless defined ($self->{$set1}); + return 0 unless defined ($self->{$set2}); + return 0 unless defined ($self->{$set1}->{rbl_IN_As_found}); + return 0 unless defined ($self->{$set2}->{rbl_IN_As_found}); + + my $inas1 = ' '.$self->{$set1}->{rbl_IN_As_found}.' '; + my $inas2 = ' '.$self->{$set2}->{rbl_IN_As_found}.' '; + if ($inas1 =~ / ${addr1} / and $inas2 =~ / ${addr2} /) { return 1; } + + return 0; +} + ########################################################################### diff -urN SpamAssassin.orig/PerMsgStatus.pm SpamAssassin/PerMsgStatus.pm --- SpamAssassin.orig/PerMsgStatus.pm Sun Apr 14 04:12:53 2002 +++ SpamAssassin/PerMsgStatus.pm Mon May 27 01:44:03 2002 @@ -1528,6 +1528,8 @@ eval { $result = $self->$evalsub(@args); }; + dbg("Running rule $rulename and got result $result"); + if ($@) { warn "Failed to run $rulename SpamAssassin test, skipping:\n". "\t($@)\n"; -- Microsoft is to operating systems & security .... .... what McDonalds is to gourmet cooking Home page: http://marc.merlins.org/ | Finger [EMAIL PROTECTED] for PGP key _______________________________________________________________ Don't miss the 2002 Sprint PCS Application Developer's Conference August 25-28 in Las Vegas -- http://devcon.sprintpcs.com/adp/index.cfm _______________________________________________ Spamassassin-talk mailing list [EMAIL PROTECTED] https://lists.sourceforge.net/lists/listinfo/spamassassin-talk