Re: postfix terminating on signal 15

Jeroen van Aart Wed, 26 Aug 2009 15:55:51 -0700

Wietse Venema wrote:

The 15-minute distance suggests that the system was already in
trouble long before the qmgr voluntarily exited with status 1.

True, it was only when the load began blocking processes for extendedamounts of time that the problems would occur. One of the monitors,outputting the results of "top" and "free" to a file, I had to run atnice -5 in order to get any useful results at the highest loads.

When a system is totally hosed, it is unfortunate but understandable
that the syslog datagram with the error message gets lost.

Yes I understand. I also didn't trust timestamps that much when thesystem was having such high load.

I recommend that you update the monitoring process to identify the
process that is gobbling up all the system resources.

The above mentioned monitor actually successfully did that. And once itmanaged to show useful output with nice -5 the offending (java) processshowed an interesting 1500+% CPU usage in top. Interesting because theserver has "only" 4 CPU cores (and 32 GB of ram).

When I managed systems I had a "watcher" (*) cron job that would
(*) ftp://ftp.isc.org/usenet/comp.sources.unix/volume11/watcher/

Thanks I'm checking it out. It'll be a bit of a challenge compiling iton my debian system.


Greetings,
Jeroen

Re: postfix terminating on signal 15

Reply via email to