On Sun, Sep 14, 2014 at 09:39:21PM -0700, Will Yardley wrote:

> I've had Postfix seemingly stop responding 3 times on 2 systems over the
> weekend. In all cases, the system is accessible, but logging in is
> sluggish, however, load, iowait, memory usage, etc. all appear perfectly
> normal. Postfix doesn't log any unusual errors / warnings / etc., but
> simply seems to stop logging abruptly. So it seems like it may be a
> kernel param or ulimit issue, vs. Postfix config or performance
> constraint.

Postfix will not log anything when nothing is happening, so if mail
input is frozen, naturally not much is logged.

I've seen kernel bugs that result in the queue manager freezing
due to lost I/O notifications on sockets that are ready to read,
but the Linux kernel fails to inform the process waiting for the
socket to become readable.

The following possibilities come to mind:

    * I/O between qmgr(8) and trivial-rewrite(8) is delayed for a long
      time, when the kernel fails to send a ready notification.

    * The active smtpd(8) acceptor on is stuck (due to a kernel bug),
      and all the other smtpd(8) processes are waiting for the stuck
      process to release the acceptor lock.

    * The active cleanup(8) acceptor on is stuck (due to a kernel bug),
      and all the other cleanup(8) processes are waiting for the stuck
      process to release the acceptor lock.

    * Your DNS resolution is not functioning, which could also
      explain slow login, but in this case, you'd generally max
      out the smtpd(8) process limit.

You need to determine which smtpd(8) process (if any) is blocked
in accept(2) while all rest are blocked trying to obtain a lock.

You need to determine which cleanup(8) process (if any) is blocked
in accept(2) while all rest are blocked trying to obtain a lock.

Alternatively, determine whether qmgr(8) is blocked on I/O with
trivial-rewrite(8).

-- 
        Viktor.

Reply via email to