Sudden degradation in Postfix performance.

Jonathan K. Tullett Sun, 21 Dec 2014 01:14:57 -0800

Greetings,

I've been using Postfix for many years - since about 2002 - and I've
finally come across a problem I've not been able to resolve by searching
online, or from tapping into my personal network. So I have come to you all
for help.


I have two machines:
Machine A: My primary 8 core Xeon 2.27GHz, 24Gb RAM primarily running
Postfix 2.6.6 (SLES 6.6 distro)
Machine B: A test box 16 core Xeon 2.2Ghz w/16Gb RAM, primarily running
Apache, RabbitMQ, MemcacheD and finally Postfix 2.9 (Ubuntu 12.04).

Machine A is used to distribute to a couple of double-opt-in mailing lists
a week, total recipients between 30,000 and 180,000.  The 'sendmail' binary
is used to inject messages into the queue from a distribution manager.

The setup on the server is simple: Postfix with two header checks to
prepend a List-Unsubscribe and a Precedence header, and using an OpenDKIM
milter for Domain Keys signing. Nothing other than that.

Prior to the last week of October using the distribution manager, it was
possible on machine A to inject around 25 messages (full size - about 70k
each) a second into the maildrop queue.

Since the end of October, that number has dropped to 16 a second on a good
day.

I wrote a test script (basic for-loop which sent a 1 line, 500 byte email)
and disabled the milters (OpenDKIM and header_checks), it took 12.75
seconds to inject 500 messages onto Machine A.

As a test, I ran exactly the same script on Machine B. It injected 1000
messages (about 500 bytes in size) into the maildrop queue in 4.95 seconds.

(I appreciate Machine B is slightly higher spec, but I wouldn't expect such
disparity!)

I ran qshape during the last mailing on machine A, and the machine was able
to send mails out as fast as it received them; there was no congestion in
any of the queues (maildrop, incoming, outgoing, etc).

I have no machine stats prior to October - I only came onto the project
last week - do I don't know what (if anything) changed on that week to
cause performance to drop so suddenly.

I have run read/write tests on both disks - Machine A and B do about
500Mb/second reads, and 380Mb/second writes; all looks OK.

I'm not sure why SLES 6.6 was chosen as it was a new build in August, but
know only Postfix 2.6.6 is officially available in the repo for that
distribution. I have 2.11.3 built and ready to go on that machine but would
prefer not to just upgrade on the off-chance it'll 'fix' the problem when
there may be something I'm missing entirely.

Have there been huge improvements to the efficiency of the code base
between 2.6 and 2.9 (or 2.11)?  Does anyone have suggestions on where else
I can look for the cause?

Thank you in advance for any help you can provide.

--
Jonathan K. Tullett

Sudden degradation in Postfix performance.

Reply via email to