Greetings, I've been using Postfix for many years - since about 2002 - and I've finally come across a problem I've not been able to resolve by searching online, or from tapping into my personal network. So I have come to you all for help.
I have two machines: Machine A: My primary 8 core Xeon 2.27GHz, 24Gb RAM primarily running Postfix 2.6.6 (SLES 6.6 distro) Machine B: A test box 16 core Xeon 2.2Ghz w/16Gb RAM, primarily running Apache, RabbitMQ, MemcacheD and finally Postfix 2.9 (Ubuntu 12.04). Machine A is used to distribute to a couple of double-opt-in mailing lists a week, total recipients between 30,000 and 180,000. The 'sendmail' binary is used to inject messages into the queue from a distribution manager. The setup on the server is simple: Postfix with two header checks to prepend a List-Unsubscribe and a Precedence header, and using an OpenDKIM milter for Domain Keys signing. Nothing other than that. Prior to the last week of October using the distribution manager, it was possible on machine A to inject around 25 messages (full size - about 70k each) a second into the maildrop queue. Since the end of October, that number has dropped to 16 a second on a good day. I wrote a test script (basic for-loop which sent a 1 line, 500 byte email) and disabled the milters (OpenDKIM and header_checks), it took 12.75 seconds to inject 500 messages onto Machine A. As a test, I ran exactly the same script on Machine B. It injected 1000 messages (about 500 bytes in size) into the maildrop queue in 4.95 seconds. (I appreciate Machine B is slightly higher spec, but I wouldn't expect such disparity!) I ran qshape during the last mailing on machine A, and the machine was able to send mails out as fast as it received them; there was no congestion in any of the queues (maildrop, incoming, outgoing, etc). I have no machine stats prior to October - I only came onto the project last week - do I don't know what (if anything) changed on that week to cause performance to drop so suddenly. I have run read/write tests on both disks - Machine A and B do about 500Mb/second reads, and 380Mb/second writes; all looks OK. I'm not sure why SLES 6.6 was chosen as it was a new build in August, but know only Postfix 2.6.6 is officially available in the repo for that distribution. I have 2.11.3 built and ready to go on that machine but would prefer not to just upgrade on the off-chance it'll 'fix' the problem when there may be something I'm missing entirely. Have there been huge improvements to the efficiency of the code base between 2.6 and 2.9 (or 2.11)? Does anyone have suggestions on where else I can look for the cause? Thank you in advance for any help you can provide. -- Jonathan K. Tullett