On 3/4/20 10:12 AM, Alexander Grund wrote:
we have a Power9 partition with 44 processors having 4 cores each totaling
176.
What is your hardware configuration? Do you have 1 server with 44
processor sockets, and each processor has 4 CPU cores? Or is it maybe 1
server with 1 or more sockets for a total of 44 CPU cores, and each CPU
core is running 4 hyperthreads?
I think you should give the relevant node and partition lines from your
slurm.conf.
Which Slurm version do you run?
`scontrol show node <node>` shows "CoresPerSocket=22" and "CPUTot=176"
which confuses me. Especially as `whypending` reports e.g. "172 cores free: 1"
The whypending tool does not appear in a google search. Where did you get
it from and what does it do?
So what are "CPUs" and what are "Cores" to SLURM? Why does it mix up those 2?
Most importantly: Does this mean `--cpus-per-task` can be as high as 176
on this node and `--mem-per-cpu` can be up to the reported "RealMemory"/176?
Perhaps this page will be of use to you:
https://slurm.schedmd.com/cpu_management.html
/Ole