There was no outage.
The queue filled faster than the processes could process them through.

I do not know which limit to increase to accommodate such bursts of traffic.

I did find 27 instances of this block of info in the logs:

postfix/qmgr[PID]: QUEUE_ID: from=<sender>, size=1370, nrcpt=1 (queue active)
postfix/qmgr[PID]: warning: mail for [127.0.0.1]:10024 is using up NUMBER of 
NUMBER active queue entries
postfix/qmgr[PID]: warning: you may need to reduce smtp-amavis connect and helo 
timeouts
postfix/qmgr[PID]: warning: so that Postfix quickly skips unavailable hosts
postfix/qmgr[PID]: warning: you may need to increase the main.cf 
minimal_backoff_time and maximal_backoff_time
postfix/qmgr[PID]: warning: so that Postfix wastes less time on undeliverable 
mail
postfix/qmgr[PID]: warning: you may need to increase the master.cf smtp-amavis 
process limit
postfix/qmgr[PID]: warning: please avoid flushing the whole queue when you have
postfix/qmgr[PID]: warning: lots of deferred mail, that is bad for performance
postfix/qmgr[PID]: warning: to turn off these warnings specify: 
qmgr_clog_warn_time = 0

From postconf:
minimal_backoff_time = 300s
maximal_backoff_time = 4000s
smtp_helo_timeout = 300s

But where do I find smtp-amavis connect timeout ?

Is it the milter_connect_timeout ?


From: Wietse Venema <wie...@porcupine.org>
Date: Monday, January 23, 2023 at 11:28
To: Daniel White <daniel.e.wh...@nasa.gov>
Cc: Postfix users <postfix-users@postfix.org>
Subject: [EXTERNAL] Re: Mail queue took 3 hours to recover from a flood. 
Suggestions ?

White, Daniel E. (GSFC-770.0)[AEGIS]:
Around 12000 messages.
The queue went from ~3000 to over 12000 in about 30 minutes and then took 3 
hours to grind through all of them.

I am still trying to determine if this was an accident or not.
The source claims it was not intentionally malicious.

Some postconf values:

default_destination_concurrency_failed_cohort_limit = 1
default_destination_concurrency_limit = 20
default_process_limit = 100

I did not see anything at 
http://www.postfix.org/TUNING_README.html<https://gcc02.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.postfix.org%2FTUNING_README.html&data=05%7C01%7Cdaniel.e.white%40nasa.gov%7C9a6019394f694266dd3908dafd5ecf70%7C7005d45845be48ae8140d43da96dd17b%7C0%7C0%7C638100880889689135%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=5UPRuHrxK16Fw47%2FjVKPP4dm35xBAO%2F7CfehjEh30DY%3D&reserved=0>
 that looked like it would help, but then we are operating on a skeleton crew, 
and I do not have the luxury to spend time digging into the details.


When a message was not delivered for 30min because of an outage,
then it will take 30min before Postfix tries to deliver that message
again. So it will take at last an hour to clear the queue, more
depending on how much additional mail was queued in the meantime.

Without further details there can be no useful help.

                Wietse

Reply via email to