Postfix 2.2, CentOS 4 (yes, I want to upgrade; can't for now).

Note: I have a course of action, but not completely confident I
 understand the problem so seeking other eyes on it. See bottom.

On a fallback relay serving several first-pass postfix servers, qmgr
seems to sometimes stop and rest while mail is being relayed in.  It
looks like this:

Large mailing begins on the first-pass servers, and fallback starts
receiving a lot of relayed mail.  At first, both active and incoming
queues are growing, and mail is being delivered.  At some point:
 - qmgr stops moving messages from incoming to active
 - what's already in active stops being looked at at all
 - mail continues to pour into incoming
Active queue is *not* full when this happens, it has plenty of room.
(Plus, I've seen full active queues and postfix doesn't behave like this)

Mail log at this time is full of messages from smtpd & cleanup,
showing new messages coming in, but nothing else.  No errors at
the beginning indicating something wrong.  Tracing qmgr shows it's
just waiting ( probably for a message from master? )

# strace -p 31741
Process 31741 attached - interrupt to quit
futex(0x2a96b46930, FUTEX_WAIT, 2, NULL ^C<unfinished ...>

When this happens, it stays that way until I catch it.  Reloading
postfix fixes the problem, and from that point on postfix moves
messages from incoming to active, and attempts to deliver what's in
active, even though mail continues to get relayed in rapidly.

Also, this is intermittent.  It doesn't happen every time.  Often an
entire large mailing completes without this happening on the fallback
at all.

  .....

I can see that messages are coming in too quickly for postfix to
handle, and I should increase in_flow_delay or reduce number of smtpd
processes in master.cf or both, to slow it down.

However, I can't see the effect of these changes until the next large
mailing, and not even then.  In the meantime, I want to try to understand
the problem better.  Since it's intermittent, I won't necessarily know
when I've fixed it even if I have.

* Is the behavior I'm seeing something I can expect when inflow is too fast?

 - What's qmgr waiting for, and why is it not happening?
 - Why does it take a reload to nudge it back into action?
Messages are coming in at about the same rate before and after the
reload, but before the reload qmgr is doing nothing (sometimes for
over an hour until I catch it); after the reload everything works.
  -- Cos

Reply via email to