Hi Bjørn-Helge,

On 3/7/25 08:59, Bjørn-Helge Mevik via slurm-users wrote:
My 2¢:

If upgrading the deb packages does *not* restart the services, then you
can just upgrade all the slurm packages on the controller, then restart
slurmdbd first and slurmctld afterwards.  (This is how I do upgrades
with rpms.)  If upgrading *does* restart the services, then you'd have
to stop and disable them first (stop slurmctld, then slurmdbd), and
after the upgrade, enable and start them (slurmdbd first), as others
have answered.

Readers must understand that we are discussing *minor release* upgrades only.

When you have slurmctld and slurmdbd running on the same machine, all the Slurm packages will get upgraded simultaneously. The question is whether or not Systemd is going to restart the services as part of the package upgrade post-install? This is the case with the EL8 RPM packages built from the Slurm tar-balls (see [1]), and this works great for us with *minor release* upgrades.

As for the order of starting slurmctld and slurmdbd services running on the same server, I think it doesn't really matter with *minor release* upgrades, because there won't be any changes to the Slurm database format. Here I assume that Slurm minor upgrades don't crash the services :-) We have never experienced any such crashes for many, many past Slurm releases.

The slurmctld can be restarted immediately after upgrading without slurmdbd being available, and thereby your cluster will keep running without any interruption of service. A little later you can enable and start slurmdbd, and the delay of slurmdbd doesn't cause any problems for slurmctld or the users. I emphasize that we're discussing *minor release* upgrades only!

@Bjørn-Helge: Do you think there is good reason to start slurmdbd before slurmctld when doing minor release upgrades?

All in all, Slurm is very resilient when doing upgrades! Major release upgrades involves Slurm database format changes, and this must be done carefully, see the information in [2].

IMHO, Best Practice is to run slurmdbd and slurmctld on separate servers. I understand that with small clusters one may not afford the use of multiple servers, though.

Best regards,
Ole

[1] https://wiki.fysik.dtu.dk/Niflheim_system/Slurm_installation/#build-slurm-packages [2] https://wiki.fysik.dtu.dk/Niflheim_system/Slurm_installation/#upgrading-slurm

--
slurm-users mailing list -- slurm-users@lists.schedmd.com
To unsubscribe send an email to slurm-users-le...@lists.schedmd.com

Reply via email to