Re: Calling spamassassin directly yields very different results than calling spamassassin via amavis-new

Ben Johnson Wed, 16 Jan 2013 10:19:13 -0800


On 1/16/2013 11:00 AM, John Hardin wrote:
> On Wed, 16 Jan 2013, Ben Johnson wrote:
> 
>> On 1/15/2013 5:22 PM, John Hardin wrote:
>>> On Tue, 15 Jan 2013, Ben Johnson wrote:
>>>>
>>>> Wow! Adding several more reject_rbl_client entries to the
>>>> smtpd_recipient_restrictions directive in the Postfix configuration
>>>> seems to be having a tremendous impact. The amount of spam coming
>>>> through has dropped by 90% or more. This was a HUGELY helpful
>>>> suggestion, John!
>>>
>>> Which ones are you using now? There are DNSBLs that are good, but not
>>> quite good enough to trust as hard-reject SMTP-time filters. That's why
>>> SA does scored DNSBL checks.
>>
>> smtpd_recipient_restrictions =
>>     reject_rbl_client bl.spamcop.net,
>>     reject_rbl_client list.dsbl.org,
>>     reject_rbl_client sbl-xbl.spamhaus.org,
>>     reject_rbl_client cbl.abuseat.org,
>>     reject_rbl_client dul.dnsbl.sorbs.net,
> 
> Several of those are combined into ZEN. If you use Zen instead you'll
> save some DNS queries. See the Spamhaus link I provided earlier for
> details, I don't offhand remember which ones go into ZEN.


Per Noel's advice, I have shortened the list (dsbl.org is defunct) and
acted upon your mutual suggestion regarding ZEN:

reject_rbl_client bl.spamcop.net,
reject_rbl_client zen.spamhaus.org,
reject_rbl_client dnsbl.sorbs.net,

Indeed, block entries for all three lists are being registered in the
mail log. Very nice.

It seems as though adding these SMTP-time rejects has blocked about 1/2
of the spam that was coming through previously. Awesome.

>> These are "hard rejects", right? So if this change has reduced spam,
>> said spam would not be accepted for delivery at all; it would be
>> rejected outright. Correct? (And if I understand you, this is part of
>> your concern.)
> 
> Correct.
> 
>> The reason I ask, and a point that I should have clarified in my last
>> post, is that the *volume* of spam didn't drop by 90% (although, it may
>> have dropped by some measure), but rather the accuracy with which SA
>> tagged spam was 90% higher.
> 
> That's odd. That suggests you SA wasn't looking up those DNSBLs, or they
> would have contributed to the score.
> 
> Check your trusted networks setting. One difference between SMTP-time
> and SA-time DNSBL checks is that SMTP-time checks the IP address of the
> client talking to the MTA, while SA-time can go back up the relay chain
> if necessary (e.g. to check the client IP submitting to your ISP if your
> ISP's MTA is between your MTA and the Internet, rather than always
> checking your ISP's MTA IP address).

Are you referring to SA's "trusted_networks" directive? If so, it is
commented-out (presumably by default). Does this need to be set? I've
read the info re: trusted_networks at
http://spamassassin.apache.org/full/3.3.x/doc/Mail_SpamAssassin_Conf.html ,
but I'm struggling to understand it.

If the info is helpful, I have a very simple setup here: a single server
with a single public IP address and a single MTA.

>> Ultimately, I'm wondering if the observed change was simply a product of
>> these message "campaigns" being black-listed after a few days of
>> circulation, and not the Postfix configuration change.
> 
> Maybe.
> 
>> At this point, the vast majority of X-Spam-Status headers include Razor2
>> and Pyzor tests that contribute significantly to the score. I should
>> have mentioned earlier that I installed Razor2 and Pyzor after making my
>> initial post. The only reasons I didn't are that a) they didn't seem to
>> be making a significant difference for the first day or so after I
>> installed them (this could be for the snowshoe reasons we've already
>> discussed), and b) the low Bayes scores seemed to be the real problem
>> anyway.
>>
>> That said, the Bayes scores seem to be much more accurate now, too. I
>> was hardly ever seeing BAYES_99 before, but now almost all spam messages
>> have BAYES_99.
> 
> Odd. SMTP-time hard rejects shouldn't change that.

That's what I figured. I wonder if feeding all of the messages that I
"auto-learned manually" -- messages that were tagged as spam (but for
reasons unrelated to Bayes) -- contributed significantly to this change.
I did this late yesterday afternoon and when I took a status check this
morning, I was seeing BAYES_99 for almost every message.

>> Is it possible that the training I've been doing over the last week or
>> so wasn't *effective* until recently, say, after restarting some
>> component of the mail stack? My understanding is that calling SA via
>> Amavis, which does not need/use the spamd daemon, forces all Bayes data
>> to be up-to-date on each call to spamassassin.
> 
> That shouldn't be the case. SA and sa-learn both use a shared-access
> database; if you're training the database that SA is learning, the
> results of training should be effective immediately.
> 

Okay, good. Bowie's response to this question differed (he suggested
that Amavis would need to be restarted for Bayes to be updated), but I'm
pretty sure that restarting Amavis is not necessary. It seems unlikely
that Amavis would copy the entire Bayes DB (which is stored in MySQL on
this server) into memory every time that the Amavis service is started.
To do so seems self-defeating: more RAM usage, worse performance, etc.

So, I emptied the Bayes DB and re-trained ham and spam on my hand-sorted
corpus. The net result was to discard all previous end-user training, if
I understand correctly.

Everything still looks good; mostly BAYES_99 on the messages that are
and should be marked as spam, and no false-positives at all.

I've disabled the Antispam plug-in for now, for the reasons we've
already discussed. I have asked the Dovecot mailing list for suggestions
regarding how best to pre-screen end-user training submissions.

I think I'm in pretty good shape here, unless setting trusted_networks
is a must, in which case I could use some guidance.

All the best,

-Ben

Re: Calling spamassassin directly yields very different results than calling spamassassin via amavis-new

Reply via email to