Title: Re: [slurm-dev] Re: Rolling maintenance jobs

I thought's what scontrol reboot was all about.

https://slurm.schedmd.com/scontrol.html#OPT_reboot

Just point reboot in slurm.conf to a script in shared storage, and modify the script to do whatever you need to do - be that os upgrades or simple reboots.


On 02/08//2017 11:11, Bjørn-Helge Mevik wrote:
"Golpayegani, Navid (GSFC-6190)" <[email protected]> writes:

Hi all,
  Is there a way to submit submit maintenance jobs in a rolling fashion? What
I’m thinking is the ability to run a job on every node in a slurm
cluster/queue in exclusive mode but X at a time.
We do this for rolling upgrades.  Basically, we submit X copies of a
jobscript that asks for exclusive access to any node with a feature
"fixme" (actually, we use "vaskmeg" :).  The jobs are run as root and
specify --nice -10000 to get highest priority.  They do their job,
remove the "fixme" feature from the node, and then request themself to
be requeued.

Prior to submit the jobs, we add the "fixme" feature to all nodes
needing maintenance.

(In reality, our setup is a little mor complex, since it includes
reinstalling the os on the nodes, but the principle is the same.)


Attachment: smime.p7s
Description: S/MIME Cryptographic Signature

Reply via email to