Hello list,

In Nov we updated OpenSSL due to latest security alerts, and at the same
time, placed Postfix-2.7.2 everywhere (in house package version).

>From this day, the graphs charting the number of smtpd processes have gone
from ~100 to about ~500. So it is definitely a recent change we did, that
made something worse.

I have noticed that if I setup the L4 to just do the health-checks, but not
pass through any emails, the amount of smtpd processes balloons to ~1000,
then shrinks down to a stable ~400.

They are all idle, waiting for a timeout. Presumably there is some process
reuse in Postfix to save on the forks?


Doing truss of the smtpd process, it does:

25964:   0.0000 read(11, " Q U I T\r\n", 4096)                  = 6
25964:   0.0001 ioctl(11, FIONREAD, 0x080475E4)                 Err#131
ECONNRESET
25964:   0.0000 time()                                          = 1418196042
25964:   0.0000 fxstat(2, 7, 0x08046B68)                        = 0
25964:   0.0000 time()                                          = 1418196042

25964:  300.0100        pollsys(0x080477F8, 1, 0x080477D0, 0x00000000)  = 0

25964:   0.0002 close(11)                                       = 0
25964:   0.0001 write(5, " l e\0\0 Z\t\0\001\0\0\0", 12)        Err#32 EPIPE
25964:   0.0001     Received signal #13, SIGPIPE [ignored]
25964:   0.0001 _exit(0)

Which timeout could this be? I wouldn't expect the L4 health checks, which
only connect then issue "QUIT" to bring processes to 1,000. I have gone
through many of the 300s timeouts in postconf.


The stack of the waiting smtpd processes are:

 fe9b101a poll     (8047798, 1, 2710, 0) + 52
 08098667 write_wait (b, a, 0, 0) + 3b
 0809ad30 timed_write (b, 8122210, f, a, 0, 0) + 20
 080961c3 vstream_fflush_some (8113620, f, 8047850, 0) + 13e
 08096c3b vstream_fflush (8113620, 0, 0, 0) + 3c
 08096c7c vstream_fclose (8113620, 8047e29, 8047d94) + 39
 08070482 single_server_wakeup (b, 2, 0, fea0b7e5) + ee

which leads me to believe it is the stream->timeout used in
vstream_fflush_some().  This connects then to:

smtpd_timeout = 20

Which balloons the processes to about ~110 on start. Better, but not really
solving the problem.

But perhaps the better question is, why is the flush needing to time out.

I am guessing the flush is on fd = 11, as that is the write;

  11: S_IFSOCK mode:0666 dev:296,0 ino:15774 uid:0 gid:0 size:0
      O_RDWR FD_CLOEXEC
        SOCK_STREAM

SO_REUSEADDR,SO_KEEPALIVE,SO_SNDBUF(49152),SO_RCVBUF(49152),IP_NEXTHOP(0
.192.0.0)
        sockname: AF_INET 0.0.0.0  port: 0



One answer could be to move to a more recent version of Postfix, but that
is a considerably larger task. Was there a known issue with flush and
upgrading will definitely solve it?


Thanks for any insight,

Lund




alias_database = btree:/etc/postfix/aliases
alias_maps =
bounce_queue_lifetime = 0
bounce_size_limit = 1
broken_sasl_auth_clients = yes
command_directory = /usr/sbin
config_directory = /etc/postfix
content_filter = smtp-amavis:[127.0.0.1]:10024
daemon_directory = /usr/libexec/postfix
data_directory = /var/lib/postfix
debug_peer_level = 2
default_database_type = hash
disable_vrfy_command = yes
html_directory = no
inet_interfaces = all
mail_owner = postfix
mailq_path = /usr/bin/mailq
manpage_directory = /usr/local/man
maximal_queue_lifetime = 1d
message_size_limit = 150000000
mydestination =
mydomain = <removed>
myhostname = <removed>
mynetworks = 172.16.0.0/16, 127.0.0.0/8, 172.20.12.0/24, 172.20.11.0/24
mynetworks_style = subnet
newaliases_path = /usr/bin/newaliases
queue_directory = /var/spool/postfix
queue_run_delay = 15m
readme_directory = no
sample_directory = /etc/postfix
sendmail_path = /usr/lib/sendmail
setgid_group = postdrop
smtp_fallback_relay = [172.17.26.5], [172.17.26.6], [172.17.26.7],
[172.17.26.8]
smtp_tls_security_level = may
smtpd_banner = $myhostname ESMTP $mail_name
smtpd_client_restrictions = permit_mynetworks,
hash:/etc/postfix/access,                            permit_sasl_authenticated,
smtpd_delay_reject = no
smtpd_helo_required = yes
smtpd_recipient_restrictions = permit_sasl_authenticated,
permit_mynetworks,         reject_unauth_destination,
reject_invalid_hostname,        reject_non_fqdn_recipient,
reject_non_fqdn_sender,        reject_unknown_sender_domain,
reject_unknown_recipient_domain,        reject_unauth_pipelining
check_sender_access hash:/etc/postfix/access
smtpd_sasl_auth_enable = yes
smtpd_sasl_local_domain = localhost
smtpd_sasl_path = private/auth
smtpd_sasl_security_options = noanonymous
smtpd_sasl_type = dovecot
smtpd_tls_loglevel = 1
smtpd_tls_security_level = may
smtpd_tls_session_cache_timeout = 3600s
smtpd_use_tls = yes
unknown_local_recipient_reject_code = 450


Reply via email to