On Sun, Sep 14, 2014 at 09:39:21PM -0700, Will Yardley wrote: > I've had Postfix seemingly stop responding 3 times on 2 systems over the > weekend. In all cases, the system is accessible, but logging in is > sluggish, however, load, iowait, memory usage, etc. all appear perfectly > normal. Postfix doesn't log any unusual errors / warnings / etc., but > simply seems to stop logging abruptly. So it seems like it may be a > kernel param or ulimit issue, vs. Postfix config or performance > constraint.
Postfix will not log anything when nothing is happening, so if mail input is frozen, naturally not much is logged. I've seen kernel bugs that result in the queue manager freezing due to lost I/O notifications on sockets that are ready to read, but the Linux kernel fails to inform the process waiting for the socket to become readable. The following possibilities come to mind: * I/O between qmgr(8) and trivial-rewrite(8) is delayed for a long time, when the kernel fails to send a ready notification. * The active smtpd(8) acceptor on is stuck (due to a kernel bug), and all the other smtpd(8) processes are waiting for the stuck process to release the acceptor lock. * The active cleanup(8) acceptor on is stuck (due to a kernel bug), and all the other cleanup(8) processes are waiting for the stuck process to release the acceptor lock. * Your DNS resolution is not functioning, which could also explain slow login, but in this case, you'd generally max out the smtpd(8) process limit. You need to determine which smtpd(8) process (if any) is blocked in accept(2) while all rest are blocked trying to obtain a lock. You need to determine which cleanup(8) process (if any) is blocked in accept(2) while all rest are blocked trying to obtain a lock. Alternatively, determine whether qmgr(8) is blocked on I/O with trivial-rewrite(8). -- Viktor.