On 5/22/2013 10:13 AM, Viktor Dukhovni wrote:
> On Wed, May 22, 2013 at 04:45:42PM +0300, Selcuk Yazar wrote:
> 
>>>> If your content filter is not very fast, bursts of mail will accumulate<
>>>> while they are waiting to be scanned.  Then the queue becomes empty.
>>>>
>>>> You may also have deferred mail that is retried periodically. You logs
>>>> have a more complete picture.
>>>>
>>>> To improve content filter performance, eliminate remote DNS lookups
>>>> in the filter, or increate concurrency.  If the problem is lack of
>>>> sufficient CPU resources, try to find a more performant scanner or
>>>> turn off optional scanning features you don't need.
>>>>
>>>> Since mail is not delayed for very long, there is no problem (certainly
>>>> not with Postfix itself, but scanning could perhaps be tuned).
>>
>> I found a script for log analyze (sourceforge), result are like below.  I
>> think we have some queue problem, as I understand, %95 e-mails wait in
>> queue 132 seconds ?
> 
> No, less than 5% of messages spend more than 132s in the active
> queue.  Most messages spend less than 21s, with 50%s delivered
> immediately.
> 
>> postfix logwatch
>>
>> === Delivery Delays Percentiles
>> ============================================================
>>                     0%       25%       50%       75%       90%       95%
>>     98%      100%
>> --------------------------------------------------------------------------------------------
>> In qmgr           0.00      0.00      0.01     21.00    110.00    132.00
>>  158.00    180.00
>> Conn setup        0.00      0.00      0.00      0.00      0.85      9.43
>>   51.00    226.00
>> Transmission      0.00      0.11      4.70      6.20     13.00     18.00
>>   22.00     73.00
>> Total             0.04      1.10      9.00     46.00    123.00    154.00
>>  180.00   2373.00
>> ============================================================================================
> 
> To understand what is actually going on, you'll have to *read* the
> logs, not just look at summaries.  

I've been using logwatch for quite some time and I've found the Delivery Delay 
Percentiles '100%' column to be seemingly pulled from thin air.  Don't rely on 
it.  

 === Delivery Delays Percentiles 
============================================================
                     0%       25%       50%       75%       90%       95%       
98%      100%
 
--------------------------------------------------------------------------------------------
 Before qmgr       0.02      0.03      0.06      0.28      0.42      0.88      
2.48      9.20
 In qmgr           0.00      0.02      0.02      0.03      0.03      0.03      
0.04      0.06
 Conn setup        0.00      0.00      0.00      0.00      0.00      0.00      
0.55      2.70
 Transmission      0.03      0.07      0.62      3.10      4.12      5.36      
9.75    232.00
 Total             0.08      0.12      1.50      3.60      4.82      6.90     
11.28    232.00
 
============================================================================================

For instance this summary of yesterday shows 232s for Transmission.  Yet when I 
search my last ~3 days of logs with:

~$ grep local /var/log/mail.log|mawk '{ print($10) }'|grep "delays"
~$ grep smtp /var/log/mail.log|mawk '{ print($10) }'|grep "delays"

the largest value I see is 3.1s, in smtp.   For local all delays are less than 
one second.

> You'll probably find occasional
> latency sending messages through the content filter.  If that's a
> problem, tune the content filter to remove DNS lookups or raise
> its concurrency.  If the content filter is using all available CPU
> resources, tune it to do less, or find a more efficient one.
> 
> Before any of that, locate the log entries showing delayed deliveries,
> read them, and figure out the reasons for the delay.

I'm using spamc/spamd via pipe so it doesn't add to delays in postfix/local log 
stamps.  To see the spamd delays I use:

~$ grep scantime /var/log/mail.log|mawk '{ print($12) }'|cut -f1 -d,

This shows the largest spamd time is 37.7s, the next largest 13.0s.  Some 95% 
appear to be less than 6s.  Summing the largest of these with corresponding 
postfix/local delays doesn't come close to 232s, but less than 40s.

-- 
Stan


Reply via email to