Re: [slurm-users] Rolling upgrade of compute nodes

2022-05-30 Thread Stephan Roth
, for us this worked smoothly. Best Ümit *From: *slurm-users on behalf of Ole Holm Nielsen *Date: *Monday, 30. May 2022 at 20:58 *To: *slurm-users@lists.schedmd.com *Subject: *Re: [slurm-users] Rolling upgrade of compute nodes On 30-05-2022 19:34, Chris Samuel wrote: On 30/5/22 10:06 am,

Re: [slurm-users] Rolling upgrade of compute nodes

2022-05-30 Thread Ümit Seren
same compute started by the updated slurmd daemon and also worked fine. So, for us this worked smoothly. Best Ümit From: slurm-users on behalf of Ole Holm Nielsen Date: Monday, 30. May 2022 at 20:58 To: slurm-users@lists.schedmd.com Subject: Re: [slurm-users] Rolling upgrade of compute

Re: [slurm-users] Rolling upgrade of compute nodes

2022-05-30 Thread Ole Holm Nielsen
On 30-05-2022 19:34, Chris Samuel wrote: On 30/5/22 10:06 am, Chris Samuel wrote: If you switch that symlink those jobs will pick up the 20.11 srun binary and that's where you may come unstuck. Just to quickly fix that, srun talks to slurmctld (which would also be 20.11 for you), slurmctld w

Re: [slurm-users] Rolling upgrade of compute nodes

2022-05-30 Thread Chris Samuel
On 30/5/22 10:06 am, Chris Samuel wrote: If you switch that symlink those jobs will pick up the 20.11 srun binary and that's where you may come unstuck. Just to quickly fix that, srun talks to slurmctld (which would also be 20.11 for you), slurmctld will talk to the slurmd's running the job

Re: [slurm-users] Rolling upgrade of compute nodes

2022-05-30 Thread Chris Samuel
On 30/5/22 3:01 am, byron wrote: The one thing I'm unsure about is as much as Linux / NFS issue than a a slurm one.  When I change the soft link for "default" to point to the new 20.11 slurm install but all the compute nodes are still run the old 19.05 version because they havent been restarte

Re: [slurm-users] Rolling upgrade of compute nodes

2022-05-30 Thread byron
Thanks for the feedback. I've done the database dryrun on a clone of our database / slurmdbd and that is all good. We have a reboot program defined. The one thing I'm unsure about is as much as Linux / NFS issue than a a slurm one. When I change the soft link for "default" to point to the new 2

Re: [slurm-users] Rolling upgrade of compute nodes

2022-05-30 Thread Ole Holm Nielsen
Hi Byron, Adding to Stephan's note, it's strongly recommended to make a database dry-run upgrade test before upgrading the production slurmdbd. Many details are in https://wiki.fysik.dtu.dk/niflheim/Slurm_installation#upgrading-slurm If you have separate slurmdbd and slurmctld machines (reco

Re: [slurm-users] Rolling upgrade of compute nodes

2022-05-29 Thread Stephan Roth
Hi Byron, If you have the means to set up a test environment to try the upgrade first, I recommend to do it. The upgrade from 19.05 to 20.11 worked for two clusters I maintain with a similar NFS based setup, except we keep the Slurm configuration separated from the Slurm software accessible

Re: [slurm-users] Rolling upgrade of compute nodes

2022-05-29 Thread Christopher Samuel
On 5/29/22 3:09 pm, byron wrote:  This is the first time I've done an upgrade of slurm and I had been hoping to do a rolling upgrade as opposed to waiting for all the jobs to finish on all the compute nodes and then switching across but I dont see how I can do it with this setup.  Does any on

[slurm-users] Rolling upgrade of compute nodes

2022-05-29 Thread byron
Hi I'm currently doing an upgrade from 19.05 to 20.11. All of our compute nodes have the same install of slurm NFS mounted. The system has been setup so that all the start scripts and configuration files point to the default installation which is a soft link to the most recent installation of sl