Wietse:

On 5/13/2013 1:28 PM, Wietse Venema wrote:
Curtis:
We are seeing an intermittent issue in our Postfix logs where we see all
outbound threads (smtp) stop delivering email or logging anything while
the active queue continues to grow.
There are many ways this can happen.

- One example is that all mail is sent to the deferred queue.

The deferred queue does not grow significantly during the incident. It is just active queue that grows.


- Another example is that your queue manager has stopped.

We'll definitely try to do an strace on qmgr to see what it's doing if we can catch it happening. qmgr doesn't seem to have stopped, however, as we see plenty of logging from postfix/qmgr in the logs after postfix/smtp entries have stopped.


And there are a bazillion other examples. This requires
that you learn to read Postfix logfiles.

http://www.postfix.org/DEBUG_README.html#logging

Searched the logs for issues already (warning|error|fatal|panic)... we haven't found anything out of the ordinary yet.


In all cases, when no mail is given to the smtp clients, they are
out of work and terminate after 100s.

Ok, we have confirmed that the postfix/smtp threads are not just hanging... after several minutes of logging nothing, each thread exits with a log entries that looks like this (real host names/IPs masked with ---):

May 9 13:36:50 --- postfix/smtp[1114]: 3b3cyK07Bzz41vV6: conversation with ---.---.com[---.---.---.---] timed out while sending message body May 9 13:36:51 --- postfix/smtp[1114]: 3b3cyK07Bzz41vV6: enabling PIX workarounds: disable_esmtp delay_dotcrlf for ---.---.com[---.---.---.---]:25 May 9 13:46:53 --- postfix/smtp[1114]: 3b3cyK07Bzz41vV6: to=<---@---.com>, relay=---.---.com[---.---.---.---]:25, delay=349260, delays=348054/0.01/604/602, dsn=4.4.2, status=deferred (conversation with ---.---.com[---.---.---.---] timed out while sending message body)

...with it being the same mail host that every active thread is trying to deliver to and eventually times out on.

So, if qmgr is still running, then my question remains the same... since the active queue is growing what are possible reasons why new smtp threads would not be spawning until every last active thread gives up on this non-responsive mail server?

It feels like something has told postfix to restart gracefully, and so it's just waiting for every smtp thread to exit before it restarts postfix. However, if that's the case, then why is it that when I issue a "postfix stop" and "postfix start" that postfix instantly starts spawning smtp threads again? If I don't restart Postfix, it seems to only start spawning smtp threads again after it has waited until every last thread has given up on the non-responsive mail server.

Curtis


        Wietse



Reply via email to