Wietse:
On 5/13/2013 3:10 PM, Wietse Venema wrote:
Your outbound SMTP connections are timing out, because the receiving
end runs a PIX/ASA "security" "firewall". These devices have a long
history of breaking SMTP and that is why Postfix turns on PIX
workarounds as logged above.
Yes, I'm familiar with that issue. Sorry, I didn't mean to make it
sound like I was asking you to interpret the logs for me. I was just
being too verbose. I was just pointing out that the smtp threads were
not hanging after all (as I originally thought).
So, if qmgr is still running, then my question remains the same... since
the active queue is growing what are possible reasons why new smtp
threads would not be spawning until every last active thread gives up on
this non-responsive mail server?
See the first example in my first reply: all mail is sent to the
deferred queue.
$ grep 'status=deferred' /the/maillog/file
Sorry, I need to figure out how to simplify my question as the extra
details I've provided keep getting you focused on the wrong thing (I
understand that you're probably just skimming my emails... I'm impressed
that you have time to answer at all).
Yes, at the time of each incident, there are a few threads that
eventually time out and throw a few emails into the deferred queue. That
does not concern me. What concerns me is that while Postfix is waiting
for these few threads to time out, the active queue is completely
ignored and is growing rapidly.
If I just leave things alone, slowly, each of these threads time out
until all smtp threads exit. When the last thread finally exits,
Postfix immediately spawns all 110 smtp threads in an effort to catch
up. It's acting like something has asked Postfix to restart gracefully
and so it will not spawn any new threads until the last thread has
exited (which takes several minutes). Every time this happens, it
causes delivery delays to hosts that we are not having any deferral
issues with at all.
It's apparent that there's something unique about our configuration, as
it does not sound like the issue I'm seeing is a common one. I'm sure
we'll get to the bottom of it, and will report back when we do...
Curtis