A few things to look at, make sure DNS/Host name resolution works,  disable
any firewalls for testing, you can lock it down after, make sure the
slurm.conf file is the same on all nodes.

I've just done a 20.11.9 to 24.05.2 upgrade along with a Centos7.9 to rhel
9.10 upgrade on all my nodes.

Sid

Sid

On Tue, 19 Nov 2024, 03:23 Daniel Rodriguez Lopez (ext) via slurm-users, <
slurm-users@lists.schedmd.com> wrote:

> Dear all,
>
> We recently tried to fix our version of slurm in every node of our
> cluster. After the instalation (slurm 20.11.9) in one of the compute
> nodes, most of the commads (squeue, sinfo, scontrol show config etc)
> returns this error:
>
>   error: Unable to contact slurm controller (connect failure)
>
> The .log files don't show any errors, we have both debugs values equal
> to debug5. Also, the rest of the cluster works as usual.
>
> I appreciate any insight on what could be the cause.
>
> Thank you and regards,
> Daniel
>
>
> --
> slurm-users mailing list -- slurm-users@lists.schedmd.com
> To unsubscribe send an email to slurm-users-le...@lists.schedmd.com
>
-- 
slurm-users mailing list -- slurm-users@lists.schedmd.com
To unsubscribe send an email to slurm-users-le...@lists.schedmd.com

Reply via email to