Re: [slurm-users] Rolling upgrade of compute nodes

2022-05-30 Thread Stephan Roth
I can confirm twhat Ümit did worked for my setup as well. But as I mentioned before, if there's any doubt, try the upgrade in a test environment first. Cheers, Stephan On 30.05.22 21:06, Ümit Seren wrote: We did a couple of major and minor SLURM upgrades without draining the compute nodes.

Re: [slurm-users] Rolling upgrade of compute nodes

2022-05-30 Thread Ümit Seren
We did a couple of major and minor SLURM upgrades without draining the compute nodes. Once slurmdbd and slurmctld were updated to the new major version, we did a package update on the compute nodes and restarted slurmd on them. The existing running jobs continued to run fine and new jobs on the s

Re: [slurm-users] Rolling upgrade of compute nodes

2022-05-30 Thread Ole Holm Nielsen
On 30-05-2022 19:34, Chris Samuel wrote: On 30/5/22 10:06 am, Chris Samuel wrote: If you switch that symlink those jobs will pick up the 20.11 srun binary and that's where you may come unstuck. Just to quickly fix that, srun talks to slurmctld (which would also be 20.11 for you), slurmctld w

Re: [slurm-users] Rolling upgrade of compute nodes

2022-05-30 Thread Chris Samuel
On 30/5/22 10:06 am, Chris Samuel wrote: If you switch that symlink those jobs will pick up the 20.11 srun binary and that's where you may come unstuck. Just to quickly fix that, srun talks to slurmctld (which would also be 20.11 for you), slurmctld will talk to the slurmd's running the job

Re: [slurm-users] Rolling upgrade of compute nodes

2022-05-30 Thread Chris Samuel
On 30/5/22 3:01 am, byron wrote: The one thing I'm unsure about is as much as Linux / NFS issue than a a slurm one.  When I change the soft link for "default" to point to the new 20.11 slurm install but all the compute nodes are still run the old 19.05 version because they havent been restarte

Re: [slurm-users] Rolling upgrade of compute nodes

2022-05-30 Thread byron
Thanks for the feedback. I've done the database dryrun on a clone of our database / slurmdbd and that is all good. We have a reboot program defined. The one thing I'm unsure about is as much as Linux / NFS issue than a a slurm one. When I change the soft link for "default" to point to the new 2

Re: [slurm-users] Rolling upgrade of compute nodes

2022-05-30 Thread Ole Holm Nielsen
Hi Byron, Adding to Stephan's note, it's strongly recommended to make a database dry-run upgrade test before upgrading the production slurmdbd. Many details are in https://wiki.fysik.dtu.dk/niflheim/Slurm_installation#upgrading-slurm If you have separate slurmdbd and slurmctld machines (reco