I'll apologize because I don't have a complete answer. I'm not sure why
that doesn't work, but my understanding of how it should work for failover
scenarios is a "SlurmctldHost" line for each of the controllers, e.g.:
SlurmctldHost=host1
SlurmctldHost=host2
...
The list format seems to be used i
Slurm versions 23.11.1, 23.02.7, 22.05.11 are now available and address
a number of recently-discovered security issues. They've been assigned
CVE-2023-49933 through CVE-2023-49938.
SchedMD customers were informed on November 29th and provided a patch on
request; this process is documented in
The SlurmctldHost value is set like the following in my slurm.conf:
SlurmctldHost=host0,host1
That seems to be legal according to the documentation. However, I get error
messages like the following:
$ srun id
srun: error: get_addr_info: getaddrinfo() failed: Name or service not known
On 12/13/23 10:44, John Joseph wrote:
Thanks for the mail, and sorry for not properly explaining what info I was
requesting, what actually I meant was that how could we could do a check
how the HPC system I set is working.
Eg a program which can be run individually on a node, and comparing ho
Hi Ole,
Thanks for the mail, and sorry for not properly explaining what info I was
requesting, what actually I meant was that how could we could do a check how
the HPC system I set is working.
Eg a program which can be run individually on a node, and comparing how the
same code performed
On 12/13/23 07:13, John Joseph wrote:
We have setup of slurm setup for a HPC setup of 4 node
We want to do a stress test , guidnace requested for getting a code which
can test the functionality of the SLURM efficiency. If there is such a
program, like to try out
Guidance requested
Then pl