Wietse Venema wrote:
It's unprodictive to kill off Postfix under overload. At the very
least you should increase your 35-second deadline.
Yes I did increase it to 120 seconds. I understand just killing and
restarting postfix is not a solution.
As a test I switched the monitor to sending an alert and not restart.
The mail.info logs show nothing very useful. The, possibly, note worthy
things around the time postfix quit are:
Aug 15 02:55:06 prod101 postfix/master[9402]: warning: process
/usr/lib/postfix/qmgr pid 9582 exit status 1
Then at some point the master process quits (without any mention in the
logs) and the postfix/smtpd processes slowly are disappearing:
Aug 15 05:13:19 prod101 postfix/smtpd[15568]: warning: problem talking
to server private/anvil: Connection refused
Aug 15 05:13:19 prod101 postfix/smtpd[14684]: lost connection after
CONNECT from unknown[77.41.50.8]
(..)
Aug 15 05:13:19 prod101 postfix/smtpd[14973]: lost connection after
CONNECT from unknown[92.85.166.176]
Aug 15 05:13:19 prod101 postfix/smtpd[11671]: disconnect from
unknown[unknown]
Aug 15 05:13:19 prod101 postfix/smtpd[13910]: lost connection after
CONNECT from unknown[123.21.38.44]
Until there is no mention of postfix anymore and a "ps -ef" shows
postfix has ceased to run. A manual "/etc/init.d/postfix start" is
required. So the babysitter at least prevents postfix from not running
for a long amount of time. Even though it gets a few "false positives".
There is an entire webpage devoted to how Postfix handles overload
and what recovery mechanisms alraedy exist.
Yes I read it before contacting the listr. I will study it again.
Thanks,
Jeroen