Re: [EXTERNAL] Mail queue took 3 hours to recover from a flood. Suggestions ?

Bill Cole Mon, 23 Jan 2023 09:36:40 -0800

On 2023-01-23 at 12:06:34 UTC-0500 (Mon, 23 Jan 2023 17:06:34 +0000)
White, Daniel E. (GSFC-770.0)[AEGIS] <daniel.e.wh...@nasa.gov>
is rumored to have said:

There was no outage.
The queue filled faster than the processes could process them through.
I do not know which limit to increase to accommodate such bursts oftraffic.
I did find 27 instances of this block of info in the logs:
postfix/qmgr[PID]: QUEUE_ID: from=<sender>, size=1370, nrcpt=1 (queueactive)postfix/qmgr[PID]: warning: mail for [127.0.0.1]:10024 is using upNUMBER of NUMBER active queue entriespostfix/qmgr[PID]: warning: you may need to reduce smtp-amavis connectand helo timeoutspostfix/qmgr[PID]: warning: so that Postfix quickly skips unavailablehostspostfix/qmgr[PID]: warning: you may need to increase the main.cfminimal_backoff_time and maximal_backoff_timepostfix/qmgr[PID]: warning: so that Postfix wastes less time onundeliverable mailpostfix/qmgr[PID]: warning: you may need to increase the master.cfsmtp-amavis process limitpostfix/qmgr[PID]: warning: please avoid flushing the whole queue whenyou havepostfix/qmgr[PID]: warning: lots of deferred mail, that is bad forperformancepostfix/qmgr[PID]: warning: to turn off these warnings specify:qmgr_clog_warn_time = 0
From postconf:
minimal_backoff_time = 300s
maximal_backoff_time = 4000s
smtp_helo_timeout = 300s

But where do I find smtp-amavis connect timeout ?

Is it the milter_connect_timeout ?

No, it appears that you are using Amavisd as a SMTP proxy in between 2Postfix smtpd processes, not a milter. Presumably 'smtp-amavis' is thepost-proxy smtpd, which uses the standard smtpd_* settings (from main.cfor defaults) unless you override those settings in master.cf. It isprobably more important to make sure that the smtpd instance on theoutput side of the proxy has a process limit equal to the one handlingthe external connection, or else that will be a bottleneck and you canget those warnings from qmgr.

I don't believe that setting the process limit on the outbound sidesmtpd service higher than the inbound side provides anything, but Viktoror Wietse will likely correct me if I'm wrong.

From: Wietse Venema <wie...@porcupine.org>
Date: Monday, January 23, 2023 at 11:28
To: Daniel White <daniel.e.wh...@nasa.gov>
Cc: Postfix users <postfix-users@postfix.org>
Subject: [EXTERNAL] Re: Mail queue took 3 hours to recover from aflood. Suggestions ?
White, Daniel E. (GSFC-770.0)[AEGIS]:
Around 12000 messages.
The queue went from ~3000 to over 12000 in about 30 minutes and thentook 3 hours to grind through all of them.
I am still trying to determine if this was an accident or not.
The source claims it was not intentionally malicious.

Some postconf values:

default_destination_concurrency_failed_cohort_limit = 1
default_destination_concurrency_limit = 20
default_process_limit = 100
I did not see anything athttp://www.postfix.org/TUNING_README.html<https://gcc02.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.postfix.org%2FTUNING_README.html&data=05%7C01%7Cdaniel.e.white%40nasa.gov%7C9a6019394f694266dd3908dafd5ecf70%7C7005d45845be48ae8140d43da96dd17b%7C0%7C0%7C638100880889689135%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=5UPRuHrxK16Fw47%2FjVKPP4dm35xBAO%2F7CfehjEh30DY%3D&reserved=0>that looked like it would help, but then we are operating on askeleton crew, and I do not have the luxury to spend time digging intothe details.
When a message was not delivered for 30min because of an outage,
then it will take 30min before Postfix tries to deliver that message
again. So it will take at last an hour to clear the queue, more
depending on how much additional mail was queued in the meantime.

Without further details there can be no useful help.

                Wietse



--
Bill Cole
b...@scconsult.com or billc...@apache.org
(AKA @grumpybozo and many *@billmail.scconsult.com addresses)
Not Currently Available For Hire

Re: [EXTERNAL] Mail queue took 3 hours to recover from a flood. Suggestions ?

Reply via email to