Hi,
my GPU testing system (named “gpu-node”) is a simple computer with one socket and a processor " Intel(R) Core(TM) i7 CPU 950 @ 3.07GHz". Executing "lscpu", I can see there are 4 cores per socket, 2 threads per core and 8 CPUs: Architecture: x86_64 CPU op-mode(s): 32-bit, 64-bit Byte Order: Little Endian CPU(s): 8 On-line CPU(s) list: 0-7 Thread(s) per core: 2 Core(s) per socket: 4 Socket(s): 1 NUMA node(s): 1 Vendor ID: GenuineIntel CPU family: 6 Model: 26 Model name: Intel(R) Core(TM) i7 CPU 950 @ 3.07GHz My “gres.conf” file is: NodeName=gpu-node Name=gpu Type=GeForce-GTX-TITAN-X File=/dev/nvidia0 CPUs=0-1 NodeName=gpu-node Name=gpu Type=GeForce-GTX-TITAN-Black File=/dev/nvidia1 CPUs=2-3 Running “numactl -H” in “gpu-node” host, reports: available: 1 nodes (0) node 0 cpus: 0 1 2 3 4 5 6 7 node 0 size: 7809 MB node 0 free: 6597 MB node distances: node 0 0: 10 CPUs are assigned 0-1 for first GPU and 2-3 for second GPU. However, “lscpu” shows 8 CPUs… If I rewrite “gres.conf” in this way: NodeName=gpu-node Name=gpu Type=GeForce-GTX-TITAN-X File=/dev/nvidia0 CPUs=0-3 NodeName=gpu-node Name=gpu Type=GeForce-GTX-TITAN-Black File=/dev/nvidia1 CPUs=4-7 when I run “scontrol reconfigure”, slurmctld log reports this error message: [2024-06-05T11:42:18.558] error: _node_config_validate: gres/gpu: invalid GRES core specification (4-7) on node gpu-node So I think SLURM only can get physical cores and not threads, so my node only can serve 4 cores (in “lspcu”) but in gres.conf I need to write “CPUs”, not “Cores”… isn’t it? But if “numactl -H” shows 8 CPUs, why I can use this 8 CPUs in “gres.conf”? Sorry about this large email. Thanks.
-- slurm-users mailing list -- slurm-users@lists.schedmd.com To unsubscribe send an email to slurm-users-le...@lists.schedmd.com