Re: per-recipient configuration

Jared Johnson Mon, 26 Jul 2010 22:36:49 -0700

> Having a common API for "per-recipient" things have long been on the todo
> list.
>
> We've talked about it a couple times before; but the
> requirements/needs/wishes never sunk in deep enough in my head to do
> anything about it.  So:  How do you use this?
>
> My thoughts all along have been to just have some way to make a "find user
> data" plugin that'll fetch a per-recipient blob of information that other
> plugins can, uh, plug into.
>
> The big stumbling block is what the common infrastructure for dealing with
> diverting needs of the recipients should be.   What would we need in terms
> of per-recipient header rewriting, post-DATA processing etc?


It is kid of a big scary can of worms.  In large part because AFAICT, a
number of shops including my own have grown their own solutions, often
leaving things just as forked as we are.  It seems that everybody's doing
it in different, incompatible ways which each shop is fully pleased with
and would find it difficult to move back way from, and really each method
seems valid enough.  It may be workable to try to support multiple
different treatments, or it may be necessary to choose one and go with it.
 The latter would mean that some existing users might just remain forked,
but if you made the right choices then probably new users would never
bother to move away.  There was a time when this was more at the forefront
of my mind and I was interested enough to 'put together a proposal', I
sent some ideas to the list at the time but now I'm not so sure we even
follow those ideas in our own fork; we've moved beyond, and yet what we
have now is still a work in progress.  So to 'do it right the first time'
some new thinking would have to take place.  But FWIW, here's some about
what we do now:

- Qpsmtpd::Address has a config() method which passes through to plugins
that call 'hook_user_config'.  Note that I used to call this
'hook_rcpt_config' but really it's quite valid to call
$txn->sender->config(...) when you've determined that this is an outbound
message and you want to know something about the sender's preferences
instead.  The way I implemented this is a bit of a mess.  At any rate I
think I submitted a patch to do this sort of thing, but I might not have
managed to stick with the review process long enough to answer everyone's
concerns/questions.  Or maybe it was accepted!  I can't really remember ;)

- Qpsmtpd::DSN is heavily modified so that it can be used to store
information in Qpsmtpd::Address (which itself has methods to manipulate
such things) about what we have decided we want to do with each recipient
and why we have chosen to do it; e.g. ( Rejected => 'Spam' ).

- Qpsmtpd::Transaction has been enhanced so that recipients() returns only
the recipients that we have (thus far) decided we're going to be
delivering to, and all_recipients() and rejected_recipients() have been
added to give information about recipients that we have decided not to
accept.

Keeping these lists around is useful for logging but may also be necessary
depending on how one decides to deal with the issue of "we've reached the
end of DATA and want to reject this recipient and accept this one based on
their different preferences, but we can only give one response". 
Obviously we respond with '250' in this case; as for the 'rejected'
recipients, some choose to drop them silently; some have a 'quarantine'
method in place and choose to quarantine for these recipients; some choose
to bounce (this last method is the most reviled, but what can I say?  I
don't want to do it, but my lead developer does.  And the situation is
pretty rare anyhow).

The content of these methods in our forked code might shed a little light
on how they work with the DSN data stored in each recipient:


sub recipients {
  my $self = shift;
  @_ and $self->{_recipients} = [...@_];
  return () unless $self->{_recipients};
  return grep { ! $_->dsn or $_->dsn->action ~~ [qw( Accepted Delivered
Queued Quarantined )] }
    @{$self->{_recipients}}
}

sub rejected_recipients {
  my $self = shift;
  @_ and $self->{_rejected_recipients} = [...@_];
  return () unless $self->{_recipients};
  return grep { $_->dsn and ! $_->dsn->action ~~ [qw(Delivered Queued
Quarantined)] }
    @{$self->{_recipients}}
}

sub all_recipients {
  my $self = shift;
  return () unless $self->{_recipients};
  return @{$self->{_recipients}};
}

- All of our per-recip-pref-aware post-data scanning plugins loop through
each to-be-accepted recipient and determine what we want to do for each
recipient.  Then a single plugin afterward handles the results and
responding to the client.  So basically, an empty $txn->recipients()
becomes a short-circuit for whatever post-data plugins are left in the
mix.  We actually don't do this with the DSN objects; we use a separate
'class' note to denote that something i 'spam', 'clean', 'whitelisted',
etc.  Then the last plugin sets the DSNs for each recipient.  Maybe we
should have figured a way to do that, idunno.

- Each recipient object can optionally have its own $rcpt->notes('header')
object (though this probably ought to be a header() method, really) and
its own body object.  In our case, we actually parse *all* MIME data with
MIME::Parser, so this body object is a $rcpt->notes('mime_body'), but
really this should probably be something more generic that has the same
accessors as transaction bodies; perhaps there should even be a
Qpsmtpd::Body and $txn->body_* should become $txn->body->* ?  Some were
just talking previously about the possibility of having MIME parsing as an
option, so it may actually be worthwhile to officially have an
$rcpt->mime_body (or Qpsmtpd::Body::mime()? as well which can be used if
MIME has been parsed.

- In postfix-queue (we haven't modified any of the other queue plugins but
if this was the method chosen all of them would have to be modified), we
loop through every recipient and queue separately for *each one*.  This
has become necessary for our own product, unfortunately, since we have
unique headers for each recipient -- something that may very well wind up
happening if you do things like adding a header with the SA score for each
recipient and every recipient has different whitelists and other
preferences.  We used to have grouping though, and that would be easy
enough to do -- queue separately for every recipient that has its own
header or body object set, and then queue the rest that fell back to
$txn->body and $txn->header all at once.

One can imagine the increased overhead involved in queueing each recipient
separately.  The postfix queue is bigger, yes, but we have had other
unexpected results, especially in instances where we are queueing to a
remote postfix over even the local network.  We have actually had to add a
limit_recipients plugin that takes the size specified at MAIL FROM
(lacking that it assumes something like 5 or 10 MB to be safe) and makes
sure that we defer all recipients after we have gotten to the point where
we would be transmitting over 200MB to Postfix.  Before we did this, it
was quite possible that we could sit around for so long queueing to
postfix before we knew that every recipient was queued, that the client
gave up on us and then of course tried again later and we wound up with
duplicated messages.  Sheesh!  We haven't seen this since we tried to
implement live delivery, which by the way we gave the heck up on long ago.

That's all I can think of right now regarding what we're doing ourselves. 
Another big question is how to deal with people who don't really need
per-recip handling and all the trouble it comes with.  One thing I
proposed at one point was that $txn->recipients() could just return a
single meta-recipient so that plugins could be written assuming per-recip
was enabled, and if it was turned off they would just run once through the
loop.  This does seem silly now though, what if we did $rcpt->dsn(
Rejected => 'We don't like this recipient' ) for grep { $_->address =~
/bob/ } $txn->recipients.  Maybe the thing to do is just go ahead and make
the stock plugins aware of the possibility of either setting and deal with
it themselves; e.g.:

unless ( $self->qp->config('per_recip_is_on') ) {
    $uid = 0; # global
    return DENY, "We hate you" if $self->is_spam( $txn, $uid );
    return DECLINED;
}
for my $rcpt ( $txn->recipients ) {
    # only scan things that previous plugins haven't figured out yet,
    # or that previous plugins want to quarantine
    # (we'll try get a more definite answer for those)
    next if ( $rcpt->class // 'quarantine' ) ne 'quarantine';
    next unless $rcpt->config('enable_spam_scanning');
    $rcpt->class('spam') if $self->is_spam( $txn, $rcpt->notes('uid') );
}

That $txn->recipient loop, by the way, is pretty much how we do things in
our forked plugins.  Then the very last post-data plugin does something
like this:

for my $rcpt ( $txn->recipients ) {
    $rcpt->dsn( Rejected => 'Spam' ) if $rcpt->class eq 'spam';
    $rcpt->dsn( Quarantined => 'Spam' ) if $rcpt->class eq 'quarantine';
    $rcpt->dsn( Queued => 'Clean' );
}
if ( $txn->recipients ) { # there some recips left to queue or quarantine
    # response() returns something like (OK, '250 Queued!')
    return Qpsmtpd::DSN->new(Queued => 'Clean')->_response;
} else { # no recipients left to be accepted, let's reject
    # showing off another Qpsmtpd::DSN method
    return DENY, ($txn->rejected_recipients)[0]->dsn->smtp_text;
}



What's troubling to me is that the existence of a per-recip and
non-per-recip mode with fundamental differences in how we handle things
seems so very much like the existing 'async daemon' and 'everything else'
targets.  Done wrong, we could end up with four total targets to write
plugins for, and more 'wait, does this work with per-recipient mode?  i
don't know i don't use that yet but i hear it's awesome' kinda like we
have with async, or worse a plugins/per_recip folder.  But per-recip is
not _that_ huge a difference as async vs. blocking, so maybe it's much ado
about nothing.  Perhaps it would even be good to use plugin inheritance as
(I just discovered recently) many async plugins to, to avoid actually
duplicating that much code and allow the 'basic' plugins to be that much
simpler...

Hope That Helps ;)

-Jared

Re: per-recipient configuration

Reply via email to