On 8/25/2011 9:10 PM, Alister West wrote:
Hi qpsmtpd,

I came across an entry in the mail-archive link where Chris Lewis
mentions he has customised qpsmtp-async for forwarding to a list of
other servers.

http://www.nntp.perl.org/group/perl.qpsmtpd/2009/07/msg8919.html

Chris>  My qpsmtp-async forwarder has a config list of IPs (and ports).

I am about to write something that does something very similar (a
load-balancer for smtp). As I'm not very familiar with qpsmtp I was
wondering if Chris is still active here and if he would mind sharing
his altered forwarder so I could use it as base for my project.

Any other suggestions, or modifications to qpsmtp-async,
plugins/..smtp-forward, etc. welcome.

My forwarder itself wasn't async, and unfortunately I can't share it either. You probably wouldn't want it because it's coded to .40 or thereabouts.

[Development is stopped on our qpsmtpd. The systems are being shut down in a few months as the company closes shop. Unfortunately, this code has to go down with the ship...]

What I would do is this to get everything I had:

Take the async/queue/smtp-forward in the release, and do the following things:

Step 1:

Alter the parameter parse to treat _smtp_server as: server:port,server:port....

Get rid of the stupid die() in new(), and just return undef.

Replace the start_queue invocation of new() to be something like this:

my $s = $self->{_smtp_server};
for my $s (split(/,/, $SERVER) {
      my ($server, $port) = split(/:/, $s);
      $port = 25 if !$port;
my $sock = AsyncSMTPSender->new($server, $port, $qp, $self, $transaction);
      if ($sock) {
           $transaction->notes('async_sender', $sock);
           return YIELD;
      }
}
return DENYSOFT;

The above will failover thru the list of servers until you get a connect. If that's all you need, you're done except for checking out (2).

2) the forwarders in qpsmtpd appear to be remarkably lax in parsing SMTP return codes. What you need to do is make _sure_ that the various SMTP states return codes _properly_ to the connecting clients. You MUST properly return tempfails (for example) from the forwarded server, tempfails from timeouts (does this thing handle timeouts at all?) and make sure you behave properly if you get a PERM fail from a RCPT TO which may be just one of many successful RCPT TOs. I had to do a lot of work on the original 0.40 forwarder to make sure it handled returns properly.

3) Load balancing: Well, all you need to do is sort the list of smtp servers before trying them. What I did was have a $self variable (set in init) that contained a hash. The hash was keyed by server:port pairs, and had zero as initial value. I used that as the internal representation of the server list (instead of $self->{_smtp_server}).

You'd initialize (in the init()) like so (using config/smtpforwarders to contain a server:port pair per line - I like config file variables over config/plugins lines):

my %servers = map { $servers{$_} = 0; }
           $self->qp->config('smtpforwarders');
$self->{_smtp_servers} = \%servers;

Each time I ended the "send an email chain", I added how long it took (in seconds) to send the email to the value for the server I used (I remembered the server:port pair in a transaction note). Then, just before I started the connect loop, I sort the list of server:port pairs based on the ascending value.

You'd be replacing the for loop above with something like this:

my %servers = %{$self->{_smtp_servers}};
my @servers = sort {$servers{$a} <=> $servers{$b}} @servers;
for my $s (@servers) {

If I had a timeout, I added something like 300 seconds to the value. If I had a tempfail, I added something like 30 to the value.

Whenever the value hit some highish value (say, 1800), I reset it to zero (before sorting).

It was kinda cool watching this simple scheduler in action. Servers tended to be invoked inversely proportionally to how long they usually took. In our case, with qpsmtpds in two separate locations, with two forwarders in each location, the "local" forwarders tended to move 3-4 times as much email as the "more remote" forwarders from the perspective of each qpsmtpd. IOW: it tended to pick the server that provided the lowest latency. Servers that threw tempfails got penalized a bit so we didn't get tricked into optimizing on servers that threw a lot of them fast.

The reset to zero was to make sure that you don't give up trying a stuck server. In our case, it tended to force a try to a thoroughly stuck server every 10 minutes or so, but it would then rapidly go to the end of the list again.

You have to remember to set the timeouts on each SMTP phase to be _less than_ <standard timeout>/N, where N is the number of forwarders. The standard timeouts are on the order of 300 seconds, with 4 servers, the SMTP step timeouts (per server) were 30 seconds. You don't want the client giving up on you while you're still trying other servers...

Note that with async, each fork() has its own copy of the sorted list. This _isn't_ a problem. With async, I think you're safe as long as you do the mucking about with the $self->{_smtp_servers} variable without doing any event waits while it's in an inconsistent state.

Mine did other things like indicate in the log which server actually got forwarded to, etc. etc. etc.

Note: the above does _not_ attempt do any failover if the failure is _after_ a failed or timed-out connect. It relies on a DENYSOFT to get the client to try again. Worked well for us (at up to 600K forwards/day on 4 servers). To have it retry, it'd have to replay the conversation with another server. Ick.

We never did notice a forwarder going down in the mail flow. Even when, in one unusual situation, the forwarders went down relative to their local qpsmtpds, requiring failover to the opposite forwarders (wierd router foulup, never mind). We had to set up external monitoring to do that.

Another way of doing this, without the complexity of hashes, sorts etc, is to externally monitor the health of the forwarders, and shuffle the config/smtpforwarders file periodically. You have to use config() to pick up smtp forwarders just before doing the connect loop.





Reply via email to