On 20/1/23 3:51 am, Stefan Staeglich wrote:
But someone who is actually using a UnkillableStepProgram stated the opposite
(that it's executed on the controller nodes). Are you aware of any change
between Slurm releases? Maybe one of the two parts is just a leftover. Are you
using a UnkillableStepProgram?
Yes, we've been using it for years on 7 different systems in my time here.
It runs on the compute nodes and collects troubleshooting info for us
when a job fails to die in an allowed time.
--
Chris Samuel : http://www.csamuel.org/ : Berkeley, CA, USA