Re: [slurm-users] Rolling upgrade of compute nodes

Ole Holm Nielsen Mon, 30 May 2022 00:18:05 -0700

Hi Byron,

Adding to Stephan's note, it's strongly recommended to make a databasedry-run upgrade test before upgrading the production slurmdbd. Manydetails are in

https://wiki.fysik.dtu.dk/niflheim/Slurm_installation#upgrading-slurm

If you have separate slurmdbd and slurmctld machines (recommended), thenext step is to upgrade the slurmctld.

Finally you can upgrade the slurmd's while the cluster is running inproduction mode. Since you have Slurm om NFS, following Chris'recommendation of rebooting the nodes may be the safest approach.

After upgrading everything to 20.11, you should next upgrade to 21.08.Upgrade to the latest 22.05 should probably wait for a few minor releases.


/Ole

On 5/30/22 08:54, Stephan Roth wrote:

If you have the means to set up a test environment to try the upgradefirst, I recommend to do it.
The upgrade from 19.05 to 20.11 worked for two clusters I maintain with asimilar NFS based setup, except we keep the Slurm configuration separatedfrom the Slurm software accessible through NFS.
For updates staying between 2 major releases this should work well byrestarting the Slurm daemons in the recommended order (seehttps://slurm.schedmd.com/SLUG19/Field_Notes_3.pdf) after switching thesoft link to 20.11:
1. slurmdbd
2. slurmctld
3. individual slurmd on your nodes
To be able to revert back to 19.05 you should dump the database betweenstopping and starting slurmdbd as well as backing up StateSaveLocationbetween stopping/restarting slurmctld.
slurmstepd's of running jobs will continue to run on 19.05 afterrestarting the slurmd's.
Check individual slurmd.log files for problems.

Cheers,
Stephan

On 30.05.22 00:09, byron wrote:
Hi

I'm currently doing an upgrade from 19.05 to 20.11.
All of our compute nodes have the same install of slurm NFS mounted. Thesystem has been setup so that all the start scripts and configurationfiles point to the default installation which is a soft link to the mostrecent installation of slurm.
This is the first time I've done an upgrade of slurm and I had beenhoping to do a rolling upgrade as opposed to waiting for all the jobs tofinish on all the compute nodes and then switching across but I dont seehow I can do it with this setup. Does any one have any expereience ofthis?

Re: [slurm-users] Rolling upgrade of compute nodes

Reply via email to