On 8/6/20 10:13 am, Jason Simms wrote:
Later this month, I will have to bring down, patch, and reboot all nodes
in our cluster for maintenance. The two options available to set nodes
into a maintenance mode seem to be either: 1) creating a system-wide
reservation, or 2) setting all nodes into a DRAIN state.
We use both. :-)
So for cases where we need to do a system wide outage for some reason we
will put reservations on in advance to ensure the system is drained for
the maintenance.
But for rolling upgrades we will build a new image, set nodes to use it
and then do something like:
scontrol reboot ASAP nextstate=resume reason="Rolling upgrade" [nodes]
That will allow running jobs to complete, drain all the nodes and when
idle they'll reboot into the new image and resume themselves once
they're back up and slurmd has started and checked in.
We use the same mechanism when we need to reboot nodes for other
maintenance activities, say when huge pages are too fragmented and the
only way to reclaim them is to reboot the node (these checks happen in
the node epilog).
We paid for enhancements to Slurm 18.08 to ensure that slurmctld took
these nodes states into account when scheduling jobs so that large jobs
(as in requiring most of the nodes in the system) do not lose their
scheduling window when a node has to be rebooted for this reason.
All the best,
Chris
--
Chris Samuel : http://www.csamuel.org/ : Berkeley, CA, USA