On 4/29/21 1:06 AM, Michael Robbert wrote:
I think that you want to use the output of slurmd -C, but if that isn’t telling you the truth then you may not have built slurm with the correct libraries. I believe that you need to build with hwloc in order to get the most accurate details of the CPU topology. Make sure you have hwloc-devel installed and try to rebuild Slurm.
Slurm has many prerequisites for building, see what I believe is the full list here: https://wiki.fysik.dtu.dk/niflheim/Slurm_installation#install-prerequisites
If you have recent Xeon or EPYC CPUs, you should also consider the number of NUMA domains per socket. Use numactl -H to see what you've got.
Regarding slurmd -C and multiple NUMA domain per socket, there's a small bug being sorted out in https://bugs.schedmd.com/show_bug.cgi?id=11434
It may be beneficial for HPC applications to enable Sub NUMA Cluster (SNC) in BIOS, see https://www.dell.com/support/kbdoc/da-dk/000176921/bios-characterization-for-hpc-with-intel-cascade-lake-processors
/Ole