Re: [slurm-users] 2 nodes being randomly set to "not responding"

2021-07-21 Thread jose
Hi, most likely you might want to set it in exact opposite way, as slurm cloud scheduling guide says: "TreeWidth Since the slurmd daemons are not aware of the network addresses of other nodes in the cloud, the slurmd daemons on each node should be sent messages directly and not forward those me

[slurm-users] 2 nodes being randomly set to "not responding"

2021-07-21 Thread Russell Jones
Hi all, We have a single slurm cluster with multiple different architectures and compute clusters talking to a single slurmctld. This slurmctld is dual-homed on two different networks. We have two individual nodes who are by themselves on "network 2" while all of the other nodes are on "network 1"

Re: [slurm-users] 4 sockets but "

2021-07-21 Thread Ole Holm Nielsen
Hi Diego, On 21-07-2021 11:56, Diego Zuccato wrote: I suspendend testing config changes to update another machine. In the last test I added "CPUs=192" to the noe definition, restarted slurmctld and nothing changed. When I returned, I checked again and slurm reported 192 CPUs! Magic? I now remo

Re: [slurm-users] 4 sockets but "

2021-07-21 Thread Diego Zuccato
Hello all. I'm speechless. I suspendend testing config changes to update another machine. In the last test I added "CPUs=192" to the noe definition, restarted slurmctld and nothing changed. When I returned, I checked again and slurm reported 192 CPUs! Magic? I now removed CPUs=192, restarted s

Re: [slurm-users] 4 sockets but "

2021-07-21 Thread Diego Zuccato
Uff... A bit mangled... Correcting and resending. Il 21/07/2021 08:18, Diego Zuccato ha scritto: Il 20/07/2021 18:02, mercan ha scritto: Hi Ahmet. Did you check slurmctld log for a complain about the host line. if the slumctld can not recognize a parameter, may be it give up processing whole