Hello Chris, Thank you for your comments. The scontrol reboot command is now working as expected.
Best regards, David ________________________________ From: slurm-users <slurm-users-boun...@lists.schedmd.com> on behalf of Christopher Samuel <ch...@csamuel.org> Sent: 16 June 2020 18:16 To: slurm-users@lists.schedmd.com <slurm-users@lists.schedmd.com> Subject: Re: [slurm-users] Nodes do not return to service after scontrol reboot On 6/16/20 8:16 am, David Baker wrote: > We are running Slurm v19.05.5 and I am experimenting with the *scontrol > reboot * command. I find that compute nodes reboot, but they are not > returned to service. Rather they remain down following the reboot.. How are you using "scontrol reboot" ? We do: scontrol reboot ASAP nextstate=resume reason=$REASON $NODE Which works for us (and we have health checks in our epilog that can trigger this for known issues like running low on unfragmented huge pages). All the best, Chris -- Chris Samuel : https://eur03.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.csamuel.org%2F&data=01%7C01%7Cd.j.baker%40soton.ac.uk%7C6fa4d9db3b0e47f6a03308d812197d60%7C4a5378f929f44d3ebe89669d03ada9d8%7C0&sdata=V9%2Fytt3ActVODtPjD%2FXAB2w5TvVhSJDYJ9%2B0xUmJRUU%3D&reserved=0 : Berkeley, CA, USA