A few things to look at, make sure DNS/Host name resolution works, disable any firewalls for testing, you can lock it down after, make sure the slurm.conf file is the same on all nodes.
I've just done a 20.11.9 to 24.05.2 upgrade along with a Centos7.9 to rhel 9.10 upgrade on all my nodes. Sid Sid On Tue, 19 Nov 2024, 03:23 Daniel Rodriguez Lopez (ext) via slurm-users, < slurm-users@lists.schedmd.com> wrote: > Dear all, > > We recently tried to fix our version of slurm in every node of our > cluster. After the instalation (slurm 20.11.9) in one of the compute > nodes, most of the commads (squeue, sinfo, scontrol show config etc) > returns this error: > > error: Unable to contact slurm controller (connect failure) > > The .log files don't show any errors, we have both debugs values equal > to debug5. Also, the rest of the cluster works as usual. > > I appreciate any insight on what could be the cause. > > Thank you and regards, > Daniel > > > -- > slurm-users mailing list -- slurm-users@lists.schedmd.com > To unsubscribe send an email to slurm-users-le...@lists.schedmd.com >
-- slurm-users mailing list -- slurm-users@lists.schedmd.com To unsubscribe send an email to slurm-users-le...@lists.schedmd.com