On Aug 6, 2022, at 15:13, Chris Samuel <ch...@csamuel.org> wrote: > > On 6/8/22 10:43 am, David Magda wrote: > >> It seems that the the new srun(1) cannot talk to the old slurmd(8). >> Is this 'on purpose'? Does the backwards compatibility of the protocol not >> extend to srun(1)? > > That's expected, what you're hoping for here is forward compatibility. > > Newer daemons know how to talk to older utilities, but it doesn't work the > other way around. > > What we do in this situation is upgrade slurmdbd, then slurmctld, change our > images for compute nodes to be ones that have the new Slurm version then > before we bring partitions back up we issue an "scontrol reboot ASAP > nextstate=resume" for all the compute nodes.
Cool. So the CLI stuff will be the last thing to ‘update’ (for us, by changing the place the link /opt/slurm points to). > It's also safe to restart slurmd's with running jobs, though you may want to > drain them before that so slurmctld won't try and send them a job in the > middle. My testing has shown that this is not the case: any jobs that are running are killed with signal 15 if I do a ’systemctl restart slurmd’ or ’service slurmd restart’. Is there some flag in slurm.conf that allows for uninterruption of jobs?