Re: Calling spamassassin directly yields very different results than calling spamassassin via amavis-new

Ben Johnson Fri, 01 Feb 2013 08:45:07 -0800


On 1/31/2013 5:50 PM, RW wrote:
> On Thu, 31 Jan 2013 12:12:15 -0800 (PST)
> John Hardin wrote:
> 
>> On Thu, 31 Jan 2013, Ben Johnson wrote:
>>
> 
>>> So, I finally got around to tackling this change.
>>>
>>> With a couple of simple modifications, I was able to achieve the
>>> desired result with the Dovecot Antispam plug-in.
>>>
>>> Basically, I changed the last two directive values from the switches
>>> that are normally passed to the "sa-learn" binary (--spam and
>>> --ham) to destination email addresses that are passed to "sendmail"
>>> in my revised pipe script.
>>
>> Passing the messages through sendmail again isn't optimal as that
>> will make further changes to the headers. This may have effects on
>> the quality of the learning, unless the original message is attached
>> as an RFC-822 attachment to the message being sent to the corpus
>> mailbox, which of course means you then can't just run sa-learn
>> directly against that mailbox - the review process would involve
>> moving the attachment as a standalone message to the spam or ham
>> learning mailbox.
>>
>> Ideally you want to just move the messages between mailboxes without 
>> involving another delivery processing. I don't know enough about
>> Dovecot or your topology to say whether that's going to be as easy as
>> using sendmail to mail the message to you.
> 
> Actually that's the way that the dovecot plugin works. I think that the
> sendmail option is mainly a way to get training done on a remote
> machine - it's a standard feature of DSPAM for which the plugin was
> originally developed.
> 
> When I looked at the plugin it seemed to have quite a serious flaw.
> IIRC it disables IMAP APPENDs on the Spam folder which makes it
> incompatible with synchronisation tools like OfflineImap and probably
> some IMAP clients that implement offline support in the same way.
>


John, thanks for pointing-out the problems associated with re-sending
the messages via sendmail.

I threw a line out to the Dovecot users group and learned how to move
messages without going through the MTA. Dovecot has a utility
executable, "deliver", which is well-suited to the task.

For those who may have a similar need, here's the Dovecot Antispam pipe
script that I'm using, courtesy of Steffen Kaiser on the Dovecot Users
mailing list:

---------------------------------------
#!/bin/bash

mode=
for opt; do
        if test "x$*" == "x--ham"; then
                mode=HAM
                break
        elif test "x$*" == "x--spam"; then
                mode=SPAM
                break
        fi
done

if test -n "$mode"; then
        # options from http://wiki1.dovecot.org/LDA
        /usr/lib/dovecot/deliver -d u...@example.com -m Training.$mode
fi

exit 0
---------------------------------------


And here are the Antispam plug-in options:


---------------------------------------
  # For Dovecot < 2.0.
  antispam_spam_pattern_ignorecase = SPAM;JUNK
  antispam_mail_tmpdir = /tmp
  antispam_mail_sendmail = /usr/bin/sa-learn-pipe.sh
  antispam_mail_spam = --spam
  antispam_mail_notspam = --ham
---------------------------------------

RW, thank you for underscoring the issue with IMAP appends. It looks as
though a configuration directive exists to control this behavior:

# Whether to allow APPENDing to SPAM folders or not. Must be set to
# "yes" (case insensitive) to be activated. Before activating, please
# read the discussion below.
# antispam_allow_append_to_spam = no

Unfortunately, I don't fully understand the implications or enabling or
disabling this option. Here's the "discussion below" that is referenced
in the above comment:

---------------------------------------
ALLOWING APPENDS?

You should be careful with allowing APPENDs to SPAM folders. The reason
for  possibly  allowing it is to allow not-SPAM --> SPAM transitions to
work with offlineimap. However, because with APPEND the  plugin  cannot
know the source of the message, multiple bad scenarios can happen:

1. SPAM --> SPAM transitions cannot be recognised and are trained

2. the same holds for Trash --> SPAM transitions

Additionally,   because   we   cannot   recognise   SPAM  -->  not-SPAM
transitions, training good messages will never work with APPEND.
---------------------------------------

In consideration of the first point, what is a "SPAM --> SPAM
transition"? Is that when the mailbox contains more than one "spam
folder", e.g., "JUNK" and "SPAM", and the user drags a message from one
to the other?

Regarding the second point, I'm not sure I understand the problem. If
someone drags a message from Trash to SPAM, shouldn't it be submitted
for learning as spam?

The last sentence sounds like somewhat of a deal-breaker. Doesn't my
whole strategy go somewhat limp if ham cannot be submitted for training?

John and RW, do you recommend enabling or disabling the append option,
given the way I'm reviewing the submissions and sorting them manually?

Sorry for all the questions! And thanks!

-Ben

Re: Calling spamassassin directly yields very different results than calling spamassassin via amavis-new

Reply via email to