On 8/25/2011 9:10 PM, Alister West wrote:
Hi qpsmtpd,
I came across an entry in the mail-archive link where Chris Lewis
mentions he has customised qpsmtp-async for forwarding to a list of
other servers.
http://www.nntp.perl.org/group/perl.qpsmtpd/2009/07/msg8919.html
Chris> My qpsmtp-async forwarder has a config list of IPs (and ports).
I am about to write something that does something very similar (a
load-balancer for smtp). As I'm not very familiar with qpsmtp I was
wondering if Chris is still active here and if he would mind sharing
his altered forwarder so I could use it as base for my project.
Any other suggestions, or modifications to qpsmtp-async,
plugins/..smtp-forward, etc. welcome.
My forwarder itself wasn't async, and unfortunately I can't share it
either. You probably wouldn't want it because it's coded to .40 or
thereabouts.
[Development is stopped on our qpsmtpd. The systems are being shut down
in a few months as the company closes shop. Unfortunately, this code
has to go down with the ship...]
What I would do is this to get everything I had:
Take the async/queue/smtp-forward in the release, and do the following
things:
Step 1:
Alter the parameter parse to treat _smtp_server as:
server:port,server:port....
Get rid of the stupid die() in new(), and just return undef.
Replace the start_queue invocation of new() to be something like this:
my $s = $self->{_smtp_server};
for my $s (split(/,/, $SERVER) {
my ($server, $port) = split(/:/, $s);
$port = 25 if !$port;
my $sock = AsyncSMTPSender->new($server, $port, $qp, $self,
$transaction);
if ($sock) {
$transaction->notes('async_sender', $sock);
return YIELD;
}
}
return DENYSOFT;
The above will failover thru the list of servers until you get a
connect. If that's all you need, you're done except for checking out (2).
2) the forwarders in qpsmtpd appear to be remarkably lax in parsing SMTP
return codes. What you need to do is make _sure_ that the various SMTP
states return codes _properly_ to the connecting clients. You MUST
properly return tempfails (for example) from the forwarded server,
tempfails from timeouts (does this thing handle timeouts at all?) and
make sure you behave properly if you get a PERM fail from a RCPT TO
which may be just one of many successful RCPT TOs. I had to do a lot of
work on the original 0.40 forwarder to make sure it handled returns
properly.
3) Load balancing: Well, all you need to do is sort the list of smtp
servers before trying them. What I did was have a $self variable (set
in init) that contained a hash. The hash was keyed by server:port
pairs, and had zero as initial value. I used that as the internal
representation of the server list (instead of $self->{_smtp_server}).
You'd initialize (in the init()) like so (using config/smtpforwarders to
contain a server:port pair per line - I like config file variables over
config/plugins lines):
my %servers = map { $servers{$_} = 0; }
$self->qp->config('smtpforwarders');
$self->{_smtp_servers} = \%servers;
Each time I ended the "send an email chain", I added how long it took
(in seconds) to send the email to the value for the server I used (I
remembered the server:port pair in a transaction note). Then, just
before I started the connect loop, I sort the list of server:port pairs
based on the ascending value.
You'd be replacing the for loop above with something like this:
my %servers = %{$self->{_smtp_servers}};
my @servers = sort {$servers{$a} <=> $servers{$b}} @servers;
for my $s (@servers) {
If I had a timeout, I added something like 300 seconds to the value. If
I had a tempfail, I added something like 30 to the value.
Whenever the value hit some highish value (say, 1800), I reset it to
zero (before sorting).
It was kinda cool watching this simple scheduler in action. Servers
tended to be invoked inversely proportionally to how long they usually
took. In our case, with qpsmtpds in two separate locations, with two
forwarders in each location, the "local" forwarders tended to move 3-4
times as much email as the "more remote" forwarders from the perspective
of each qpsmtpd. IOW: it tended to pick the server that provided the
lowest latency. Servers that threw tempfails got penalized a bit so we
didn't get tricked into optimizing on servers that threw a lot of them fast.
The reset to zero was to make sure that you don't give up trying a stuck
server. In our case, it tended to force a try to a thoroughly stuck
server every 10 minutes or so, but it would then rapidly go to the end
of the list again.
You have to remember to set the timeouts on each SMTP phase to be _less
than_ <standard timeout>/N, where N is the number of forwarders. The
standard timeouts are on the order of 300 seconds, with 4 servers, the
SMTP step timeouts (per server) were 30 seconds. You don't want the
client giving up on you while you're still trying other servers...
Note that with async, each fork() has its own copy of the sorted list.
This _isn't_ a problem. With async, I think you're safe as long as you
do the mucking about with the $self->{_smtp_servers} variable without
doing any event waits while it's in an inconsistent state.
Mine did other things like indicate in the log which server actually got
forwarded to, etc. etc. etc.
Note: the above does _not_ attempt do any failover if the failure is
_after_ a failed or timed-out connect. It relies on a DENYSOFT to get
the client to try again. Worked well for us (at up to 600K forwards/day
on 4 servers). To have it retry, it'd have to replay the conversation
with another server. Ick.
We never did notice a forwarder going down in the mail flow. Even when,
in one unusual situation, the forwarders went down relative to their
local qpsmtpds, requiring failover to the opposite forwarders (wierd
router foulup, never mind). We had to set up external monitoring to do
that.
Another way of doing this, without the complexity of hashes, sorts etc,
is to externally monitor the health of the forwarders, and shuffle the
config/smtpforwarders file periodically. You have to use config() to
pick up smtp forwarders just before doing the connect loop.