On Aug 6, 2022, at 15:13, Chris Samuel <ch...@csamuel.org> wrote:
> 
> On 6/8/22 10:43 am, David Magda wrote:
> 
>> It seems that the the new srun(1) cannot talk to the old slurmd(8).
>> Is this 'on purpose'? Does the backwards compatibility of the protocol not 
>> extend to srun(1)?
> 
> That's expected, what you're hoping for here is forward compatibility.
> 
> Newer daemons know how to talk to older utilities, but it doesn't work the 
> other way around.
> 
> What we do in this situation is upgrade slurmdbd, then slurmctld, change our 
> images for compute nodes to be ones that have the new Slurm version then 
> before we bring partitions back up we issue an "scontrol reboot ASAP 
> nextstate=resume" for all the compute nodes.

Cool. So the CLI stuff will be the last thing to ‘update’ (for us, by changing 
the place the link /opt/slurm points to).

> It's also safe to restart slurmd's with running jobs, though you may want to 
> drain them before that so slurmctld won't try and send them a job in the 
> middle.

My testing has shown that this is not the case: any jobs that are running are 
killed with signal 15 if I do a ’systemctl restart slurmd’ or ’service slurmd 
restart’. Is there some flag in slurm.conf that allows for uninterruption of 
jobs?

Reply via email to