wie...@porcupine.org (Wietse Venema) wrote:
> The Postfix process count corresponds to the number of client
> sessions.  If the number of processes goes up, either the number
> of clients goes up or the sessions last longer, i.e. clients aren't
> closing the connection.  You would notice that from the "connection
> timed out" events in the maillog file.
> 

There are no timeout messages in the logs, or rather of ~10,000
"status=sent", I get 61 messages with "timed out".

When allowing only the L4 health check (and no email passing though), we
get into this situation;


Postfix restart:

smtp01:/etc/postfix# ps -edaf| grep smtpd | wc -l
       9
smtp01:/etc/postfix# netstat -na | grep ESTABLISHED | wc -l
      30


10 minutes later:

smtp01:/etc/postfix# ps -edaf| grep smtpd | wc -l
     101
smtp01:/etc/postfix# netstat -na | grep ESTABLISHED | wc -l
      32


The connections are indeed closed and removed. (Not in TIME_WAIT etc either).


truss of smtpd that ends up waiting 300s on sync, due to socket "11" looks
like:

15870:   0.0001 accept(6, 0x08047810, 0x0804780C, SOV_DEFAULT)  = 11
15870:          AF_INET  name = 172.20.12.3  port = 1786
15870:   0.0001 setsockopt(11, SOL_SOCKET, SO_KEEPALIVE, 0x080477DC, 4,
SOV_DEFAULT) = 0
15870:   0.0001 fcntl(11, F_GETFL)                              = 130
15870:   0.0000 fcntl(11, F_SETFL, FWRITE)                      = 0
15870:   0.0001 fcntl(11, F_GETFD, 0x00000000)                  = 0
15870:   0.0001 fcntl(11, F_SETFD, 0x00000001)                  = 0
15870:   0.0001 getpeername(11, 0x0804769C, 0x080471BC, SOV_DEFAULT) = 0
15870:   0.0000 getsockname(11, 0x080474A0, 0x0804746C, SOV_DEFAULT) = 0
15870:   0.0000 pollsys(0x080474C8, 1, 0x080474A0, 0x00000000)  = 1
15870:   0.0001 write(11, " 2 2 0   s m t p . z e r".., 32)     = 32
15870:   0.0006 pollsys(0x080474F8, 1, 0x080474D0, 0x00000000)  = 1
15870:   0.0000 read(11, " Q U I T\r\n", 4096)                  = 6
15870:   0.0001 ioctl(11, FIONREAD, 0x08047614)                 Err#131
ECONNRESET

pollsys:entry fd = 11, events = 4, revents = 0

15870: 300.0100 pollsys(0x08047828, 1, 0x08047800, 0x00000000)  = 0

RETURN pid: 15870 Pollsys thread = 1 returns 1:1

15870:   0.0003 close(11)                                       = 0

It does receive ECONNRESET, and still end up in timed_write for 300s to
flush it.

L4 looks to be sending "QUIT" and dropping connection, not allowing for the
"221 2.0.0 Bye" reply, that ends up trying to flush.  Apart from changing
"smtpd_timeout" to lessen the time it takes flush to give up, are there any
other options I could use?

Thanks for your replies.

Lund

Reply via email to