There was no outage. The queue filled faster than the processes could process them through.
I do not know which limit to increase to accommodate such bursts of traffic. I did find 27 instances of this block of info in the logs: postfix/qmgr[PID]: QUEUE_ID: from=<sender>, size=1370, nrcpt=1 (queue active) postfix/qmgr[PID]: warning: mail for [127.0.0.1]:10024 is using up NUMBER of NUMBER active queue entries postfix/qmgr[PID]: warning: you may need to reduce smtp-amavis connect and helo timeouts postfix/qmgr[PID]: warning: so that Postfix quickly skips unavailable hosts postfix/qmgr[PID]: warning: you may need to increase the main.cf minimal_backoff_time and maximal_backoff_time postfix/qmgr[PID]: warning: so that Postfix wastes less time on undeliverable mail postfix/qmgr[PID]: warning: you may need to increase the master.cf smtp-amavis process limit postfix/qmgr[PID]: warning: please avoid flushing the whole queue when you have postfix/qmgr[PID]: warning: lots of deferred mail, that is bad for performance postfix/qmgr[PID]: warning: to turn off these warnings specify: qmgr_clog_warn_time = 0 From postconf: minimal_backoff_time = 300s maximal_backoff_time = 4000s smtp_helo_timeout = 300s But where do I find smtp-amavis connect timeout ? Is it the milter_connect_timeout ? From: Wietse Venema <wie...@porcupine.org> Date: Monday, January 23, 2023 at 11:28 To: Daniel White <daniel.e.wh...@nasa.gov> Cc: Postfix users <postfix-users@postfix.org> Subject: [EXTERNAL] Re: Mail queue took 3 hours to recover from a flood. Suggestions ? White, Daniel E. (GSFC-770.0)[AEGIS]: Around 12000 messages. The queue went from ~3000 to over 12000 in about 30 minutes and then took 3 hours to grind through all of them. I am still trying to determine if this was an accident or not. The source claims it was not intentionally malicious. Some postconf values: default_destination_concurrency_failed_cohort_limit = 1 default_destination_concurrency_limit = 20 default_process_limit = 100 I did not see anything at http://www.postfix.org/TUNING_README.html<https://gcc02.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.postfix.org%2FTUNING_README.html&data=05%7C01%7Cdaniel.e.white%40nasa.gov%7C9a6019394f694266dd3908dafd5ecf70%7C7005d45845be48ae8140d43da96dd17b%7C0%7C0%7C638100880889689135%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=5UPRuHrxK16Fw47%2FjVKPP4dm35xBAO%2F7CfehjEh30DY%3D&reserved=0> that looked like it would help, but then we are operating on a skeleton crew, and I do not have the luxury to spend time digging into the details. When a message was not delivered for 30min because of an outage, then it will take 30min before Postfix tries to deliver that message again. So it will take at last an hour to clear the queue, more depending on how much additional mail was queued in the meantime. Without further details there can be no useful help. Wietse