Hi all,

At the Slurm User Group I mentioned about how to tell the kernel to dump information about stuck processes from your unkillable step script to the kernel log buffer (seen via dmesg and hopefully syslog'd somewhere useful for you).

echo w > /proc/sysrq-trigger

That's it.. ;-) You probably want to echo something useful to /dev/kmsg beforehand to say what the job ID was that triggered it too.

The 'echo' will block until the kernel completes the writes, which if you've got a lot stuck may be few seconds.

Hope this is useful!

All the best,
Chris
--
  Chris Samuel  :  http://www.csamuel.org/  :  Berkeley, CA, USA

Reply via email to