>>>You're comparing apples and oranges, the RHEL 6 hosts don't receive nearly >>>enough traffic to be congested, they would perhaps be equally congested >>>under the same load. However, >>>they may have sensibly configured logging >>>with TLS loglevel 1, and/or no synchronous log writes
On our Symantec DLP servers running on RHEL 6.4 and RHEL 5.10, TLS loglevel is set to 2 in /etc/postfix/main.cf and syslog for mail activities is set to non-synchronous log writes in /etc/rsyslog.conf. smtp_tls_loglevel = 2 smtpd_tls_loglevel = 2 # Log all the mail messages in one place. mail.* -/var/log/maillog I do not know why today RHEL 5.10 server got much more traffic than other RHEL 6.4 servers in morning peak hours today. >>> Fix your logging, then measure again. A concurrency of 20 may be >>> sufficient when the log level is sane. On this RHEL 5.10 server, I have changed tls_loglevel to 1 instead of 2 in /etc/postfix/main.cf. We will keep watch the logs in peak hours this week. smtp_tls_loglevel = 1 smtpd_tls_loglevel = 1 >>> Ask the vendor whether they want you to use MX indirection or not. Last time Symantec vendor told us it was fine to do MX lookup and Window FOPE vendor recommended us to do MX lookup. Any ways, thank you so much for great helps! I am really learning a lot from you!!! Good night, Carl -----Original Message----- From: owner-postfix-us...@postfix.org [mailto:owner-postfix-us...@postfix.org] On Behalf Of Viktor Dukhovni Sent: Monday, April 28, 2014 7:45 PM To: postfix-users@postfix.org Subject: Re: Backlog to outsourced email provider On Mon, Apr 28, 2014 at 11:05:56PM +0000, Xie, Wei wrote: > header_checks = regexp:/etc/postfix/header_checks relayhost = > mail.us.messaging.microsoft.com This is effectively a miniature transport entry: relay_transport = relay:mail.us.messaging.microsoft.com default_transport = relay:mail.us.messaging.microsoft.com Don't know whether the vendor intends for you to do MX lookups here or not (you're doing MX lookups). The MX record just returns the original hostname. $ dig +noall +ans -t mx mail.us.messaging.microsoft.com mail.us.messaging.microsoft.com. IN MX 10 mail.us.messaging.microsoft.com. $ dig +noall +ans -t a mail.us.messaging.microsoft.com mail.us.messaging.microsoft.com. IN A 216.32.181.178 mail.us.messaging.microsoft.com. IN A 216.32.180.22 > smtp_tls_CAfile = /etc/postfix/service_certs/osu_ues/DigiCertCA.crt > smtp_tls_loglevel = 2 > smtpd_tls_loglevel = 2 You're killing your syslog daemon with debug logging. Why is the TLS loglevel set to 2? Have you looked at your logs? They are full of debugging noise and likely severely limit performance. For normal operation set the log level to 1. Also make sure your syslogd is not doing synchronous logging of each log entry. > smtp_tls_note_starttls_offer = yes Futile, given: > smtp_tls_security_level = encrypt > Here are the settings for the following two parameters: > > default_destination_concurrency_limit = 20 Fix your logging, then measure again. A concurrency of 20 may be sufficient when the log level is sane. > smtp_destination_concurrency_limit = > $default_destination_concurrency_limit This is redundant. > >>Either increase concurrency or reduce latency. Network captures may show > >>which protocol stage is responsible for most of the delay, even with TLS > >>one can tell whether the delay is at >>the beginning or at the end of the > >>TLS session or just low bandwidth throughout. > > We prefer to increase concurrency. The vendor might limit your concurrency, don't do that quite yet. > >>How is the relay specified with or without surrounding "[]"? > > Without surrounding "[]". > > relayhost = mail.us.messaging.microsoft.com Ask the vendor whether they want you to use MX indirection or not. > On this RHEL 5.10 server, today 10:30:00 ~ 10:59:59 the output rate of > email to this domain in the 30 minutes was 10,928. > > On other 6 RHEL 6.4 servers, today 10:30:00 ~ 10:59:59 the output rate > of email to this domain in the 30 minutes were 4,824 ~ 6,564. You're comparing apples and oranges, the RHEL 6 hosts don't receive nearly enough traffic to be congested, they would perhaps be equally congested under the same load. However, they may have sensibly configured logging with TLS loglevel 1, and/or no synchronous log writes. > Today 10:00:00 ~ 10:29:59 the output rate of email to this relay in > the 30 minutes was 9,623. > > Today 10:30:00 ~ 10:59:59 the output rate of email to this relay in > the 30 minutes was 10,928. That's more like it: Throughput * Latency = Concurrency 10928 / 1800 * 2.8 = 16.8 So with latencies around 2.8 seconds your estimate concurrency is ~17 which is close enough to 20. The problem is either that your syslogd is overwhelmed and too slow or the vendor service is too slow. Fix the first problem first. > Today 11:00:00 ~ 11:29:59 the output rate of email to this relay in > the 30 minutes was 15,597. 15597 / 1800 * 2.8 = 22.4 So the latency number from that one message is likely a bit above average. Understand and memorize this simple formula: Throughput = Concurrency / Latency fix your logging settings in main.cf and make sure that you follow the advise at the bottom of: http://www.postfix.org/LINUX_README.html Syslogd performance LINUX syslogd uses synchronous writes by default. Because of this, syslogd can actually use more system resources than Postfix. To avoid such badness, disable synchronous mail logfile writes by editing /etc/syslog.conf and by prepending a "-" to the logfile name: /etc/syslog.conf: mail.* -/var/log/mail.log Send a "kill -HUP" to the syslogd to make the change effective. -- Viktor.