Wietse:
On 5/13/2013 1:28 PM, Wietse Venema wrote:
Curtis:
We are seeing an intermittent issue in our Postfix logs where we see all
outbound threads (smtp) stop delivering email or logging anything while
the active queue continues to grow.
There are many ways this can happen.
- One example is that all mail is sent to the deferred queue.
The deferred queue does not grow significantly during the incident. It
is just active queue that grows.
- Another example is that your queue manager has stopped.
We'll definitely try to do an strace on qmgr to see what it's doing if
we can catch it happening. qmgr doesn't seem to have stopped, however,
as we see plenty of logging from postfix/qmgr in the logs after
postfix/smtp entries have stopped.
And there are a bazillion other examples. This requires
that you learn to read Postfix logfiles.
http://www.postfix.org/DEBUG_README.html#logging
Searched the logs for issues already (warning|error|fatal|panic)... we
haven't found anything out of the ordinary yet.
In all cases, when no mail is given to the smtp clients, they are
out of work and terminate after 100s.
Ok, we have confirmed that the postfix/smtp threads are not just
hanging... after several minutes of logging nothing, each thread exits
with a log entries that looks like this (real host names/IPs masked with
---):
May 9 13:36:50 --- postfix/smtp[1114]: 3b3cyK07Bzz41vV6: conversation
with ---.---.com[---.---.---.---] timed out while sending message body
May 9 13:36:51 --- postfix/smtp[1114]: 3b3cyK07Bzz41vV6: enabling PIX
workarounds: disable_esmtp delay_dotcrlf for ---.---.com[---.---.---.---]:25
May 9 13:46:53 --- postfix/smtp[1114]: 3b3cyK07Bzz41vV6:
to=<---@---.com>, relay=---.---.com[---.---.---.---]:25, delay=349260,
delays=348054/0.01/604/602, dsn=4.4.2, status=deferred (conversation
with ---.---.com[---.---.---.---] timed out while sending message body)
...with it being the same mail host that every active thread is trying
to deliver to and eventually times out on.
So, if qmgr is still running, then my question remains the same... since
the active queue is growing what are possible reasons why new smtp
threads would not be spawning until every last active thread gives up on
this non-responsive mail server?
It feels like something has told postfix to restart gracefully, and so
it's just waiting for every smtp thread to exit before it restarts
postfix. However, if that's the case, then why is it that when I issue
a "postfix stop" and "postfix start" that postfix instantly starts
spawning smtp threads again? If I don't restart Postfix, it seems to
only start spawning smtp threads again after it has waited until every
last thread has given up on the non-responsive mail server.
Curtis
Wietse