Re: Taking over a mail queue from another node

Stefan Foerster Thu, 28 Jan 2010 09:40:22 -0800

* Victor Duchovni <victor.ducho...@morganstanley.com>:
> On Thu, Jan 28, 2010 at 06:13:33PM +0100, Stefan Foerster wrote:
> > If in a mail cluster, with multiple machines having access to a shared
> > storage device (SAN, iSCSI) which is presented to the host as a normal
> > block device (e.g. /dev/sda, hosting a normal ext3 filesystem), one of
> > the mail nodes fails, what are the necessary Postfix steps to take
> > over the queue on another host?
> > 
> > I _think_ it is sufficient to provide the same configuration files as
> > on the node which failed,
> 
> If path names for the queue, data and configuration directory are different,
> you may need to adjust these in the config files.


Well, that's kind of obvious :-)

> > execute "postsuper -s" until the queue file
> > names stop changing (which shouldn't happen at all, because it is the
> > same physical filesystem)
> 
> Only needed when restoring from backups, copying queue files, ... Not
> needed when mounting a filesystem.

I think the manpage for postsuper recommends executing it at least
once before starting up Postfix. Can it do any harm in this specific
scenario?

> > What would happen to mails which weren't completely received when the
> > original node crashed? Can I prevent qmgr from trying to deliver
> > those?
> 
> Nothing needs to be done.

This one was giving me a headache. Good to know, thank you.

One last thing: If the clocks are perfectly synchronized and the
takeover didn't happen immediately but e.g. after 60 minutes
(virtualized system, dynamic resource/node allocation), it could
happen that the deferred queue holds a large number of messages which
are due for a delivery retry. Or, to quote QSHAPE_README:

,----
| When a host with lots of deferred mail is down for some time, it is
| possible for the entire deferred queue to reach its retry time
| simultaneously. This can lead to a very full active queue once the
| host comes back up. The phenomenon can repeat approximately every
| maximal_backoff_time seconds if the messages are again deferred after
| a brief burst of congestion.
`----

If the node doesn't have to process any new incoming mail, will qmgr
be able to handle six digit deferred queues?


Stefan

Re: Taking over a mail queue from another node

Reply via email to