RE: Backlog to outsourced email provider

Xie, Wei Mon, 28 Apr 2014 19:05:18 -0700

>>>You're comparing apples and oranges, the RHEL 6 hosts don't receive nearly 
>>>enough traffic to be congested, they would perhaps be equally congested 
>>>under the same load.  However, >>>they may have sensibly configured logging 
>>>with TLS loglevel 1, and/or no synchronous log writes

On our Symantec DLP servers running on RHEL 6.4 and RHEL 5.10, TLS loglevel is 
set to 2 in /etc/postfix/main.cf and syslog for mail activities is set to 
non-synchronous log writes in /etc/rsyslog.conf. 

smtp_tls_loglevel = 2
smtpd_tls_loglevel = 2

# Log all the mail messages in one place.
mail.*                                                  -/var/log/maillog

I do not know why today RHEL 5.10 server got much more traffic than other RHEL 
6.4 servers in morning peak hours today.

>>> Fix your logging, then measure again.  A concurrency of 20 may be 
>>> sufficient when the log level is sane.

On this RHEL 5.10 server, I have changed tls_loglevel to 1 instead of 2 in 
/etc/postfix/main.cf. We will keep watch the logs in peak hours this week.

smtp_tls_loglevel = 1
smtpd_tls_loglevel = 1

>>> Ask the vendor whether they want you to use MX indirection or not.

Last time Symantec vendor told us it was fine to do MX lookup and Window FOPE 
vendor recommended us to do MX lookup.

Any ways, thank you so much for great helps! I am really learning a lot from 
you!!!

Good night,

Carl

-----Original Message-----
From: owner-postfix-us...@postfix.org [mailto:owner-postfix-us...@postfix.org] 
On Behalf Of Viktor Dukhovni
Sent: Monday, April 28, 2014 7:45 PM
To: postfix-users@postfix.org
Subject: Re: Backlog to outsourced email provider

On Mon, Apr 28, 2014 at 11:05:56PM +0000, Xie, Wei wrote:

> header_checks = regexp:/etc/postfix/header_checks relayhost = 
> mail.us.messaging.microsoft.com

This is effectively a miniature transport entry:

    relay_transport = relay:mail.us.messaging.microsoft.com
    default_transport = relay:mail.us.messaging.microsoft.com

Don't know whether the vendor intends for you to do MX lookups here or not 
(you're doing MX lookups).  The MX record just returns the original hostname.

    $ dig +noall +ans -t mx mail.us.messaging.microsoft.com
    mail.us.messaging.microsoft.com. IN MX 10 mail.us.messaging.microsoft.com.

    $ dig +noall +ans -t a mail.us.messaging.microsoft.com
    mail.us.messaging.microsoft.com. IN  A 216.32.181.178
    mail.us.messaging.microsoft.com. IN  A 216.32.180.22

> smtp_tls_CAfile = /etc/postfix/service_certs/osu_ues/DigiCertCA.crt
> smtp_tls_loglevel = 2
> smtpd_tls_loglevel = 2

You're killing your syslog daemon with debug logging.  Why is the TLS loglevel 
set to 2?  Have you looked at your logs?  They are full of debugging noise and 
likely severely limit performance.
For normal operation set the log level to 1.  Also make sure your syslogd is 
not doing synchronous logging of each log entry.

> smtp_tls_note_starttls_offer = yes

Futile, given:

> smtp_tls_security_level = encrypt

> Here are the settings for the following two parameters:
> 
> default_destination_concurrency_limit = 20

Fix your logging, then measure again.  A concurrency of 20 may be sufficient 
when the log level is sane.

> smtp_destination_concurrency_limit = 
> $default_destination_concurrency_limit

This is redundant.

> >>Either increase concurrency or reduce latency.  Network captures may show 
> >>which protocol stage is responsible for most of the delay, even with TLS 
> >>one can tell whether the delay is at >>the beginning or at the end of the 
> >>TLS session or just low bandwidth throughout.
> 
> We prefer to increase concurrency.

The vendor might limit your concurrency, don't do that quite yet.

> >>How is the relay specified with or without surrounding "[]"?
> 
> Without surrounding "[]".
> 
> relayhost = mail.us.messaging.microsoft.com

Ask the vendor whether they want you to use MX indirection or not.

> On this RHEL 5.10 server, today 10:30:00 ~ 10:59:59 the output rate of 
> email to this domain in the 30 minutes was 10,928.
> 
> On other 6 RHEL 6.4 servers, today 10:30:00 ~ 10:59:59 the output rate 
> of email to this domain in the 30 minutes were 4,824 ~ 6,564.

You're comparing apples and oranges, the RHEL 6 hosts don't receive nearly 
enough traffic to be congested, they would perhaps be equally congested under 
the same load.  However, they may have sensibly configured logging with TLS 
loglevel 1, and/or no synchronous log writes.

> Today 10:00:00 ~ 10:29:59 the output rate of email to this relay in 
> the 30 minutes was 9,623.
> 
> Today 10:30:00 ~ 10:59:59 the output rate of email to this relay in 
> the 30 minutes was 10,928.

That's more like it: Throughput * Latency = Concurrency

    10928 / 1800 * 2.8 = 16.8

So with latencies around 2.8 seconds your estimate concurrency is
~17 which is close enough to 20.  The problem is either that your syslogd is 
overwhelmed and too slow or the vendor service is too slow.
Fix the first problem first.

> Today 11:00:00 ~ 11:29:59 the output rate of email to this relay in 
> the 30 minutes was 15,597.

    15597 / 1800 * 2.8 = 22.4

So the latency number from that one message is likely a bit above average.  
Understand and memorize this simple formula:

        Throughput = Concurrency / Latency

fix your logging settings in main.cf and make sure that you follow the advise 
at the bottom of:

    http://www.postfix.org/LINUX_README.html

        Syslogd performance

        LINUX syslogd uses synchronous writes by default. Because of
        this, syslogd can actually use more system resources than
        Postfix. To avoid such badness, disable synchronous mail logfile
        writes by editing /etc/syslog.conf and by prepending a "-" to
        the logfile name:

            /etc/syslog.conf:
                mail.*                          -/var/log/mail.log

        Send a "kill -HUP" to the syslogd to make the change effective.

-- 
        Viktor.

RE: Backlog to outsourced email provider

Reply via email to