On Monday 02 February 2009 17:45:15 Victor Duchovni wrote: <snip> > > > > Apparently nothing in particular: > > > > http://pastebin.ca/1325397 > > Jan 25 00:56:53 hotell01 postfix/qmgr[738]: B75CA147967: > from=<aaaa...@hotell01.pht.no>, size=29074, nrcpt=1 (queue active) > > The delivery agent scheduled to handle this message locked up for 5 > hours and gave up. It got stuck before reporting "busy" to the master > daemon, so no other smtp(8) processes were allocated.
ok. There is nothing special about that line right? You just deduct from the fact of the watchdog timeout and this beeing the last qmgr line at the predicted time? How about the log from this sun/mon then? I can not see the same pattern Here the last qmgr line is at 00:47:32 and the last smtp at 00:48:12 and I restarted at 09:12. Should there not have been a watchdog timeout around 05:47? There is nothing of the kind... http://pastebin.ca/1326125 > > our Munin http://munin.projects.linpro.no/ > > has lost the fine details that far back but there is a regular high peak > > on IOstsat just before 01:00 every night. Backup related I guess. > > > > both today and Jan 25 was a monday, so I had a look at cron.weekly which > > runs > > Perhaps your system runs out of resources during backup, and perhaps when > this happens the system behaves in ways it should not. > > I am guessing a "ready" indication arrived for the private/smtp socket, > but accept() blocked indefinitely. This would then be a kernel issue. If that is the case, would an acceptable workaround be to amend the backup scripts to do a restart of postfix a the end? Or could that still loose mail? > If this happens again, you need to catch the stuck smtp(8) *before* the > watchdog timer expires, and get a core file via "gcore". Then report a > stack trace of the process. Great. Will do. Gaute -- Programmerer - Pixelhospitalet AS Prinsessealleen 50, 0276 Oslo Tlf. 24 12 97 81 - 9074 7344