Re: Calling spamassassin directly yields very different results than calling spamassassin via amavis-new

John Hardin Tue, 15 Jan 2013 14:23:23 -0800

On Tue, 15 Jan 2013, Ben Johnson wrote:



On 1/15/2013 1:55 PM, John Hardin wrote:

On Tue, 15 Jan 2013, Ben Johnson wrote:

On 1/14/2013 8:16 PM, John Hardin wrote:

On Mon, 14 Jan 2013, Ben Johnson wrote:

Question: do you have any SMTP-time hard-reject DNSBL tests in place? Or
are they all performed by SA?


In postfix's main.cf:

smtpd_recipient_restrictions = permit_mynetworks,
permit_sasl_authenticated, check_recipient_access
mysql:/etc/postfix/mysql-virtual_recipient.cf,
reject_unauth_destination, reject_rbl_client bl.spamcop.net

Do you recommend something more?


Unfortunately I have no experience administering Postfix. Perhaps one of
the other listies can help.


Wow! Adding several more reject_rbl_client entries to the
smtpd_recipient_restrictions directive in the Postfix configuration
seems to be having a tremendous impact. The amount of spam coming
through has dropped by 90% or more. This was a HUGELY helpful
suggestion, John!

Which ones are you using now? There are DNSBLs that are good, but notquite good enough to trust as hard-reject SMTP-time filters. That's why SAdoes scored DNSBL checks.

Yes, users are allowed to train Bayes, via Dovecot's Antispam plug-in.
They do so unsupervised. Why this could be a problem is obvious. And no,
I don't retain their submissions. I probably should. I wonder if I can
make a few slight modifications to the shell script that Antispam calls,
such that it simply sends a copy of the message to an administrator
rather than calling sa-learn on the message.


That would be a very good idea if the number of users doing training is
small. At the very least, the messages should be captured to a permanent
corpus mailbox.


Good idea! I'll see if I can set this up.

Do your users also train ham? Are the procedures similar enough that
your users could become easily confused?


They do. The procedure is implemented via Dovecot's Antispam plug-in.
Basically, moving mail from Inbox to Junk trains it as spam, and moving
mail from Junk to Inbox trains it as ham. I really like this setup
(Antispam + calling SA through Amavis [i.e. not using spamd]) because
the results are effective immediately, which seems to be crucial for
combating this snowshoe spam (performance and scalability aside).

I don't find that procedure to be confusing, but people are different, I
suppose.

Hm. One thing I would watch out for in that environment is people who haveintentionally subscribed to some sort of mailing list deciding they don'twant to receive it any longer and just junking the messages rather thanunsubscribing.


However, your problem is FN Bayes scores...

The extremely odd thing is that you say you sometimes train a message as
spam, and its Bayes score goes *down*. Are you training a message and
then running it torough spamc to see if the score changed, or is this
about _similar_ messages rather than _that_ message?


Sorry for the ambiguity. This is about *similar* messages. Identical
messages, at least visually speaking (I realize that there is a lot more
to it than the visual component). For example, yesterday, I saw several
Canadian Pharmacy emails, all of which were identical with respect to
appearance. I classified each as spam, yet the Bayes score didn't budge
more than a few percent for the first three, and went *down* for the 4th.

I have to assume that while the messages (HTML-formatted) *appear* to be
identical, the underlying code has some pseudo-random element that is
designed very specifically to throw Bayes classifiers.

Out of curiosity, does the Bayes engine (or some other element of
SpamAssassin) have the ability to "see" rendered HTML messages, by
appearance, and not by source code? If it could, it would be far more
effective it seems.


That I don't know.

That, and configure the user-based training to at the very least capture
what they submit to a corpus so you can review it. Whether you do that
review pre-training or post-bayes-is-insane is up to you.


Right, right, that makes sense. I hope I can modify the Antispam plug-in
to accommodate this requirement.

Well, I can't thank you enough here, John and everyone else. I seem to
be on the right track; all is not lost.

That said, it seems clear that SA is nowhere near as effective as it can
be when an off-the-shelf configuration is used (and without configuring
the MTA to do some of the blocking).

I'll keep the list posted (pardon the pun) with regard to configuring
Antispam to fire-off a copy of any message that is submitted for
training. Ideally, whether the message is reviewed before or after
sa-learn is called will be configurable.


Great! Thanks!

--
 John Hardin KA7OHZ                    http://www.impsec.org/~jhardin/
 jhar...@impsec.org    FALaholic #11174     pgpk -a jhar...@impsec.org
 key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C  AF76 D822 E6E6 B873 2E79
-----------------------------------------------------------------------
  Your mouse has moved. Your Windows Operating System must be
  relicensed due to this hardware change. Please contact Microsoft
  to obtain a new activation key. If this hardware change results in
  added functionality you may be subject to additional license fees.
  Your system will now shut down. Thank you for choosing Microsoft.
-----------------------------------------------------------------------
 2 days until Benjamin Franklin's 307th Birthday

Re: Calling spamassassin directly yields very different results than calling spamassassin via amavis-new

Reply via email to