> netstat or lsof? Only the Postfix queue manager knows what deliveries
> are in progress, and it has never evolved a 'live status' API.  None
> of the Postfix daemons has a status query API, it just isn't part of
> the architecture.

I created a way to watch the number of processes that exist for each
of our four randmap transports (c0, c1, c2, c3) by using:
   ps -f -u postfix | grep smtp_helo_name=mail01-cx | wc -l
The script generates one line that looks like this when there is no load:
   smtp procs: 0, 0, 0, 0 = 0
at the time we start loading the server with outgoing email:
   smtp procs: 29, 26, 30, 31 = 116
and increases to the following under maximum load:
   smtp procs: 49, 52, 48, 58 = 207
I can see the "slowly rises over time" mentioned by Weitse.  I'm not
sure how this relates to maxproc in master.cf where each of the
randmap transports are set to 128.

> That state of affairs Sounds fine.  Rather than monitoring queue size,
> it may be better to monitor smoothed running averages of the "b", "c"
> and "d", times in:

 >    delays=a/b/c/d

The first thing I look at is a set of stats by ISP: Email Sent, Ave
Delay, Max Delay and conn use=.  We are seeing Ave Delay of 1-2
seconds and conn use= at 80% for the large ISPs.  If this is not the
case, I dig into why not.

If maxproc is too small for the randmap transports, the Ave Delay will
increase and our throughput will decrease.  We can also see a dramatic
increase (10 times) in transactions per second to our io subsystem
which are SSDs.  A good run will see steady transactions per second
over time as it was this morning.  Here is the maximum load log
interval from this morning (we get a snapshot like this once every 10
seconds in our logs -- this logging does not noticeably change server
performance):

   01:03:57 up 26 days, 19:11,  0 users,  load average: 0.52, 0.33, 0.13
                 total        used        free      shared  buff/cache
  available
   Mem:          3.6Gi       642Mi       426Mi       0.0Ki       2.6Gi
      2.8Gi
   Swap:         3.2Gi        94Mi       3.1Gi
   Device             tps    kB_read/s    kB_wrtn/s    kB_read    kB_wrtn
   dm-0            256.00         0.00      5058.00          0       5058
   incoming/active                         T   5 10 20 40 80 160 320
640 1280 1280+
                                   TOTAL 168 168  0  0  0  0   0   0
0    0     0
                               yahoo.com  65  65  0  0  0  0   0   0
0    0     0
                               gmail.com  39  39  0  0  0  0   0   0
0    0     0
                             comcast.net  12  12  0  0  0  0   0   0
0    0     0
   deferred                                 T  5 10 20 40 80 160 320
640 1280 1280+
                                     TOTAL 56 53  0  0  3  0   0   0
0    0     0
                               comcast.net 48 48  0  0  0  0   0   0
0    0     0
                                  satab.mx  1  0  0  0  1  0   0   0
0    0     0
                                 gmail.com  1  0  0  0  1  0   0   0
0    0     0
   smtp procs: 49, 52, 48, 58 = 207

Plenty of memory, no swapping, io tps is moderate, active queue size
is low, processor loading of four cores is low, smtp procs increased
to 207 and a bit of throttling from comcast. We will increase the
incoming load on the mail server for the run on Tuesday morning.  I
expect io tps will remain the same, smtp processes will increase,
processor loading will increase and email throughput will increase --
we will see.

Thanks for the feedback! Greg
www.RayStedman.org


Blessings, Greg
www.RayStedman.org


On Sun, Jul 11, 2021 at 7:04 PM Viktor Dukhovni
<postfix-us...@dukhovni.org> wrote:
>
> On Sat, Jul 10, 2021 at 07:34:15AM -0700, Greg Sims wrote:
>
> > I am tuning the performance of our mail server.    We collect
> > information in our logs every 10 seconds including qshape, iostat,
> > free and mpstat.  It seems that the maxproc parameter in master.cf is
> > important for us as we can see the size of the queues decrease as we
> > increase maxproc -- as expected.
>
> Running "qshape" every 10s does seem rather excessive.  Two employers
> and over a decade ago I had a "qshaped" that kept state between scans
> avoiding rereading the same queue file twice, and would generate a
> nalert if some age bucket exceeded a threshold occupancy.  I never
> released "qshaped" to the world at large.
>
> If you are running "qshape" to measure queue size, use "qshape -s" to
> count senders, so that messages with many recipients don't distort the
> numbers.
>
> My take is that what matters is latency and so long as most messages
> leave the queue quickly the queue size is not a problem.
>
> I don't typically raise max_proc across board, but rather only raise the
> process limits for smtpd(8) and perhaps smtp(8) (given sufficient
> network capacity).  Delivery via local(8) and pipe(8) tends to be
> CPU-intensive, and I don't want high process counts there.
>
> > We are currently running with qshape showing 1,000 emails in the
> > incoming/active queue maximum -- all less than 5 minutes.
>
> That state of affairs Sounds fine.  Rather than monitoring queue size,
> it may be better to monitor smoothed running averages of the "b", "c"
> and "d", times in:
>
>     delays=a/b/c/d
>
> --
>     Viktor.

Reply via email to