After a bit more testing I can answer my original question: I was just
too impatient. When the ResumeProgram comes back with an exit code != 0
SLURM doesn't taint the node, i.e., it tries to start it again after a
while. Exactly what I want! :-)
@Lachlan Musicman: My slurm.conf Node and Partition
Paul and Chris,
Thanks for the information. This is the first time I had a reason to
restart the slurmd processes (instead of just 'scontrol reconfigure)
outside of a maintenance window, and wanted to be 100% sure before risk
killing all the user jobs on a Friday afternoon.
I'm happy to say