Hi,


my GPU testing system (named “gpu-node”) is a simple computer with one socket 
and a processor " Intel(R) Core(TM) i7 CPU 950  @ 3.07GHz". Executing "lscpu", 
I can see there are 4 cores per socket, 2 threads per core and 8 CPUs:

Architecture:          x86_64

CPU op-mode(s):        32-bit, 64-bit

Byte Order:            Little Endian

CPU(s):                8

On-line CPU(s) list:   0-7

Thread(s) per core:    2

Core(s) per socket:    4

Socket(s):             1

NUMA node(s):          1

Vendor ID:             GenuineIntel

CPU family:            6

Model:                 26

Model name:            Intel(R) Core(TM) i7 CPU         950  @ 3.07GHz





My “gres.conf” file is:

NodeName=gpu-node Name=gpu Type=GeForce-GTX-TITAN-X File=/dev/nvidia0 CPUs=0-1

NodeName=gpu-node Name=gpu Type=GeForce-GTX-TITAN-Black File=/dev/nvidia1 
CPUs=2-3



Running “numactl -H” in “gpu-node” host, reports:

available: 1 nodes (0)

node 0 cpus: 0 1 2 3 4 5 6 7

node 0 size: 7809 MB

node 0 free: 6597 MB

node distances:

node   0

  0:  10



CPUs are assigned 0-1 for first GPU and 2-3 for second GPU. However, “lscpu” 
shows 8 CPUs… If I rewrite “gres.conf” in this way:

NodeName=gpu-node Name=gpu Type=GeForce-GTX-TITAN-X File=/dev/nvidia0 CPUs=0-3

NodeName=gpu-node Name=gpu Type=GeForce-GTX-TITAN-Black File=/dev/nvidia1 
CPUs=4-7



when I run “scontrol reconfigure”, slurmctld log reports this error message:

[2024-06-05T11:42:18.558] error: _node_config_validate: gres/gpu: invalid GRES 
core specification (4-7) on node gpu-node



So I think SLURM only can get physical cores and not threads, so my node only 
can serve 4 cores (in “lspcu”) but in gres.conf I need to write “CPUs”, not 
“Cores”… isn’t it?



But if “numactl -H” shows 8 CPUs, why I can use this 8 CPUs in “gres.conf”?



Sorry about this large email.



Thanks.
-- 
slurm-users mailing list -- slurm-users@lists.schedmd.com
To unsubscribe send an email to slurm-users-le...@lists.schedmd.com

Reply via email to