> netstat or lsof? Only the Postfix queue manager knows what deliveries > are in progress, and it has never evolved a 'live status' API. None > of the Postfix daemons has a status query API, it just isn't part of > the architecture.
I created a way to watch the number of processes that exist for each of our four randmap transports (c0, c1, c2, c3) by using: ps -f -u postfix | grep smtp_helo_name=mail01-cx | wc -l The script generates one line that looks like this when there is no load: smtp procs: 0, 0, 0, 0 = 0 at the time we start loading the server with outgoing email: smtp procs: 29, 26, 30, 31 = 116 and increases to the following under maximum load: smtp procs: 49, 52, 48, 58 = 207 I can see the "slowly rises over time" mentioned by Weitse. I'm not sure how this relates to maxproc in master.cf where each of the randmap transports are set to 128. > That state of affairs Sounds fine. Rather than monitoring queue size, > it may be better to monitor smoothed running averages of the "b", "c" > and "d", times in: > delays=a/b/c/d The first thing I look at is a set of stats by ISP: Email Sent, Ave Delay, Max Delay and conn use=. We are seeing Ave Delay of 1-2 seconds and conn use= at 80% for the large ISPs. If this is not the case, I dig into why not. If maxproc is too small for the randmap transports, the Ave Delay will increase and our throughput will decrease. We can also see a dramatic increase (10 times) in transactions per second to our io subsystem which are SSDs. A good run will see steady transactions per second over time as it was this morning. Here is the maximum load log interval from this morning (we get a snapshot like this once every 10 seconds in our logs -- this logging does not noticeably change server performance): 01:03:57 up 26 days, 19:11, 0 users, load average: 0.52, 0.33, 0.13 total used free shared buff/cache available Mem: 3.6Gi 642Mi 426Mi 0.0Ki 2.6Gi 2.8Gi Swap: 3.2Gi 94Mi 3.1Gi Device tps kB_read/s kB_wrtn/s kB_read kB_wrtn dm-0 256.00 0.00 5058.00 0 5058 incoming/active T 5 10 20 40 80 160 320 640 1280 1280+ TOTAL 168 168 0 0 0 0 0 0 0 0 0 yahoo.com 65 65 0 0 0 0 0 0 0 0 0 gmail.com 39 39 0 0 0 0 0 0 0 0 0 comcast.net 12 12 0 0 0 0 0 0 0 0 0 deferred T 5 10 20 40 80 160 320 640 1280 1280+ TOTAL 56 53 0 0 3 0 0 0 0 0 0 comcast.net 48 48 0 0 0 0 0 0 0 0 0 satab.mx 1 0 0 0 1 0 0 0 0 0 0 gmail.com 1 0 0 0 1 0 0 0 0 0 0 smtp procs: 49, 52, 48, 58 = 207 Plenty of memory, no swapping, io tps is moderate, active queue size is low, processor loading of four cores is low, smtp procs increased to 207 and a bit of throttling from comcast. We will increase the incoming load on the mail server for the run on Tuesday morning. I expect io tps will remain the same, smtp processes will increase, processor loading will increase and email throughput will increase -- we will see. Thanks for the feedback! Greg www.RayStedman.org Blessings, Greg www.RayStedman.org On Sun, Jul 11, 2021 at 7:04 PM Viktor Dukhovni <postfix-us...@dukhovni.org> wrote: > > On Sat, Jul 10, 2021 at 07:34:15AM -0700, Greg Sims wrote: > > > I am tuning the performance of our mail server. We collect > > information in our logs every 10 seconds including qshape, iostat, > > free and mpstat. It seems that the maxproc parameter in master.cf is > > important for us as we can see the size of the queues decrease as we > > increase maxproc -- as expected. > > Running "qshape" every 10s does seem rather excessive. Two employers > and over a decade ago I had a "qshaped" that kept state between scans > avoiding rereading the same queue file twice, and would generate a > nalert if some age bucket exceeded a threshold occupancy. I never > released "qshaped" to the world at large. > > If you are running "qshape" to measure queue size, use "qshape -s" to > count senders, so that messages with many recipients don't distort the > numbers. > > My take is that what matters is latency and so long as most messages > leave the queue quickly the queue size is not a problem. > > I don't typically raise max_proc across board, but rather only raise the > process limits for smtpd(8) and perhaps smtp(8) (given sufficient > network capacity). Delivery via local(8) and pipe(8) tends to be > CPU-intensive, and I don't want high process counts there. > > > We are currently running with qshape showing 1,000 emails in the > > incoming/active queue maximum -- all less than 5 minutes. > > That state of affairs Sounds fine. Rather than monitoring queue size, > it may be better to monitor smoothed running averages of the "b", "c" > and "d", times in: > > delays=a/b/c/d > > -- > Viktor.