Hello list, In Nov we updated OpenSSL due to latest security alerts, and at the same time, placed Postfix-2.7.2 everywhere (in house package version).
>From this day, the graphs charting the number of smtpd processes have gone from ~100 to about ~500. So it is definitely a recent change we did, that made something worse. I have noticed that if I setup the L4 to just do the health-checks, but not pass through any emails, the amount of smtpd processes balloons to ~1000, then shrinks down to a stable ~400. They are all idle, waiting for a timeout. Presumably there is some process reuse in Postfix to save on the forks? Doing truss of the smtpd process, it does: 25964: 0.0000 read(11, " Q U I T\r\n", 4096) = 6 25964: 0.0001 ioctl(11, FIONREAD, 0x080475E4) Err#131 ECONNRESET 25964: 0.0000 time() = 1418196042 25964: 0.0000 fxstat(2, 7, 0x08046B68) = 0 25964: 0.0000 time() = 1418196042 25964: 300.0100 pollsys(0x080477F8, 1, 0x080477D0, 0x00000000) = 0 25964: 0.0002 close(11) = 0 25964: 0.0001 write(5, " l e\0\0 Z\t\0\001\0\0\0", 12) Err#32 EPIPE 25964: 0.0001 Received signal #13, SIGPIPE [ignored] 25964: 0.0001 _exit(0) Which timeout could this be? I wouldn't expect the L4 health checks, which only connect then issue "QUIT" to bring processes to 1,000. I have gone through many of the 300s timeouts in postconf. The stack of the waiting smtpd processes are: fe9b101a poll (8047798, 1, 2710, 0) + 52 08098667 write_wait (b, a, 0, 0) + 3b 0809ad30 timed_write (b, 8122210, f, a, 0, 0) + 20 080961c3 vstream_fflush_some (8113620, f, 8047850, 0) + 13e 08096c3b vstream_fflush (8113620, 0, 0, 0) + 3c 08096c7c vstream_fclose (8113620, 8047e29, 8047d94) + 39 08070482 single_server_wakeup (b, 2, 0, fea0b7e5) + ee which leads me to believe it is the stream->timeout used in vstream_fflush_some(). This connects then to: smtpd_timeout = 20 Which balloons the processes to about ~110 on start. Better, but not really solving the problem. But perhaps the better question is, why is the flush needing to time out. I am guessing the flush is on fd = 11, as that is the write; 11: S_IFSOCK mode:0666 dev:296,0 ino:15774 uid:0 gid:0 size:0 O_RDWR FD_CLOEXEC SOCK_STREAM SO_REUSEADDR,SO_KEEPALIVE,SO_SNDBUF(49152),SO_RCVBUF(49152),IP_NEXTHOP(0 .192.0.0) sockname: AF_INET 0.0.0.0 port: 0 One answer could be to move to a more recent version of Postfix, but that is a considerably larger task. Was there a known issue with flush and upgrading will definitely solve it? Thanks for any insight, Lund alias_database = btree:/etc/postfix/aliases alias_maps = bounce_queue_lifetime = 0 bounce_size_limit = 1 broken_sasl_auth_clients = yes command_directory = /usr/sbin config_directory = /etc/postfix content_filter = smtp-amavis:[127.0.0.1]:10024 daemon_directory = /usr/libexec/postfix data_directory = /var/lib/postfix debug_peer_level = 2 default_database_type = hash disable_vrfy_command = yes html_directory = no inet_interfaces = all mail_owner = postfix mailq_path = /usr/bin/mailq manpage_directory = /usr/local/man maximal_queue_lifetime = 1d message_size_limit = 150000000 mydestination = mydomain = <removed> myhostname = <removed> mynetworks = 172.16.0.0/16, 127.0.0.0/8, 172.20.12.0/24, 172.20.11.0/24 mynetworks_style = subnet newaliases_path = /usr/bin/newaliases queue_directory = /var/spool/postfix queue_run_delay = 15m readme_directory = no sample_directory = /etc/postfix sendmail_path = /usr/lib/sendmail setgid_group = postdrop smtp_fallback_relay = [172.17.26.5], [172.17.26.6], [172.17.26.7], [172.17.26.8] smtp_tls_security_level = may smtpd_banner = $myhostname ESMTP $mail_name smtpd_client_restrictions = permit_mynetworks, hash:/etc/postfix/access, permit_sasl_authenticated, smtpd_delay_reject = no smtpd_helo_required = yes smtpd_recipient_restrictions = permit_sasl_authenticated, permit_mynetworks, reject_unauth_destination, reject_invalid_hostname, reject_non_fqdn_recipient, reject_non_fqdn_sender, reject_unknown_sender_domain, reject_unknown_recipient_domain, reject_unauth_pipelining check_sender_access hash:/etc/postfix/access smtpd_sasl_auth_enable = yes smtpd_sasl_local_domain = localhost smtpd_sasl_path = private/auth smtpd_sasl_security_options = noanonymous smtpd_sasl_type = dovecot smtpd_tls_loglevel = 1 smtpd_tls_security_level = may smtpd_tls_session_cache_timeout = 3600s smtpd_use_tls = yes unknown_local_recipient_reject_code = 450