Postfix 2.2, CentOS 4 (yes, I want to upgrade; can't for now). Note: I have a course of action, but not completely confident I understand the problem so seeking other eyes on it. See bottom.
On a fallback relay serving several first-pass postfix servers, qmgr seems to sometimes stop and rest while mail is being relayed in. It looks like this: Large mailing begins on the first-pass servers, and fallback starts receiving a lot of relayed mail. At first, both active and incoming queues are growing, and mail is being delivered. At some point: - qmgr stops moving messages from incoming to active - what's already in active stops being looked at at all - mail continues to pour into incoming Active queue is *not* full when this happens, it has plenty of room. (Plus, I've seen full active queues and postfix doesn't behave like this) Mail log at this time is full of messages from smtpd & cleanup, showing new messages coming in, but nothing else. No errors at the beginning indicating something wrong. Tracing qmgr shows it's just waiting ( probably for a message from master? ) # strace -p 31741 Process 31741 attached - interrupt to quit futex(0x2a96b46930, FUTEX_WAIT, 2, NULL ^C<unfinished ...> When this happens, it stays that way until I catch it. Reloading postfix fixes the problem, and from that point on postfix moves messages from incoming to active, and attempts to deliver what's in active, even though mail continues to get relayed in rapidly. Also, this is intermittent. It doesn't happen every time. Often an entire large mailing completes without this happening on the fallback at all. ..... I can see that messages are coming in too quickly for postfix to handle, and I should increase in_flow_delay or reduce number of smtpd processes in master.cf or both, to slow it down. However, I can't see the effect of these changes until the next large mailing, and not even then. In the meantime, I want to try to understand the problem better. Since it's intermittent, I won't necessarily know when I've fixed it even if I have. * Is the behavior I'm seeing something I can expect when inflow is too fast? - What's qmgr waiting for, and why is it not happening? - Why does it take a reload to nudge it back into action? Messages are coming in at about the same rate before and after the reload, but before the reload qmgr is doing nothing (sometimes for over an hour until I catch it); after the reload everything works. -- Cos