On 6/16/20 8:16 am, David Baker wrote:

We are running Slurm v19.05.5 and I am experimenting with the *scontrol reboot * command. I find that compute nodes reboot, but they are not returned to service. Rather they remain down following the reboot..

How are you using "scontrol reboot" ?

We do:

scontrol reboot ASAP nextstate=resume reason=$REASON $NODE

Which works for us (and we have health checks in our epilog that can trigger this for known issues like running low on unfragmented huge pages).

All the best,
Chris
--
  Chris Samuel  :  http://www.csamuel.org/  :  Berkeley, CA, USA

Reply via email to