On 8/6/20 10:13 am, Jason Simms wrote:

Later this month, I will have to bring down, patch, and reboot all nodes in our cluster for maintenance. The two options available to set nodes into a maintenance mode seem to be either: 1) creating a system-wide reservation, or 2) setting all nodes into a DRAIN state.

We use both. :-)

So for cases where we need to do a system wide outage for some reason we will put reservations on in advance to ensure the system is drained for the maintenance.

But for rolling upgrades we will build a new image, set nodes to use it and then do something like:

scontrol reboot ASAP nextstate=resume reason="Rolling upgrade" [nodes]

That will allow running jobs to complete, drain all the nodes and when idle they'll reboot into the new image and resume themselves once they're back up and slurmd has started and checked in.

We use the same mechanism when we need to reboot nodes for other maintenance activities, say when huge pages are too fragmented and the only way to reclaim them is to reboot the node (these checks happen in the node epilog).

We paid for enhancements to Slurm 18.08 to ensure that slurmctld took these nodes states into account when scheduling jobs so that large jobs (as in requiring most of the nodes in the system) do not lose their scheduling window when a node has to be rebooted for this reason.

All the best,
Chris
--
  Chris Samuel  :  http://www.csamuel.org/  :  Berkeley, CA, USA

Reply via email to