The SlurmctldHost value is set like the following in my slurm.conf:
SlurmctldHost=host0,host1 That seems to be legal according to the documentation. However, I get error messages like the following: $ srun id srun: error: get_addr_info: getaddrinfo() failed: Name or service not known srun: error: slurm_set_addr: Unable to resolve "host0,host1" srun: error: Unable to establish control machine address srun: error: Unable to allocate resources: Address already in use If I try to put IP addresses in parentheses per the documentation, I get different errors: $ srun id srun: error: Bad value "host0(12.34.56.78),host1" for SlurmctldHost srun: error: No SlurmctldHost defined. srun: fatal: Unable to process configuration file If I put a single hostname, or a hostname with an address in parentheses as the value for SlurmctldHost, it works fine but I have no failover. I’m running 23.02.6: $ sinfo --version slurm 23.02.6 What’s going on? -- Gary
smime.p7s
Description: S/MIME cryptographic signature