On 8/3/22 8:37 am, Phil Chiu wrote:

Therefore my problem is this - "Reboot all nodes, permitting N nodes to be rebooting simultaneously."

I think currently the only way to do that would be to have a script that does:

* issue the `scontrol reboot ASAP nextstate=resume [...]` for 3 nodes
* wait for 1 to come back to being online
* issue an `scontrol reboot` for another node
* wait for 1 more to come back
* lather rinse repeat.

This does assume you've got your nodes configured to come back cleanly on a reboot with slurmd up and no manual intervention required (which is what we do).

All the best,
Chris
--
Chris Samuel  :  http://www.csamuel.org/  :  Berkeley, CA, USA


Reply via email to