|
I thought's what scontrol reboot was all about. https://slurm.schedmd.com/scontrol.html#OPT_reboot Just point reboot in slurm.conf to a script in shared storage,
and modify the script to do whatever you need to do - be that os
upgrades or simple reboots. On 02/08//2017 11:11, Bjørn-Helge Mevik
wrote:
"Golpayegani, Navid (GSFC-6190)" <[email protected]> writes:Hi all, Is there a way to submit submit maintenance jobs in a rolling fashion? What I’m thinking is the ability to run a job on every node in a slurm cluster/queue in exclusive mode but X at a time.We do this for rolling upgrades. Basically, we submit X copies of a jobscript that asks for exclusive access to any node with a feature "fixme" (actually, we use "vaskmeg" :). The jobs are run as root and specify --nice -10000 to get highest priority. They do their job, remove the "fixme" feature from the node, and then request themself to be requeued. Prior to submit the jobs, we add the "fixme" feature to all nodes needing maintenance. (In reality, our setup is a little mor complex, since it includes reinstalling the os on the nodes, but the principle is the same.) |
smime.p7s
Description: S/MIME Cryptographic Signature
