Hi Dave, On Wed, Oct 25, 2017 at 9:23 PM, Dave Sizer <dsi...@nvidia.com> wrote: > For some reason, we are observing that the preferred CPUs defined in > gres.conf for GPU devices are being ignored when running jobs. That is, in > our gres.conf we have gpu resource lines, such as: > > Name=gpu Type=kepler File=/dev/nvidia0 > CPUs=0,1,2,3,4,5,6,7,16,17,18,19,20,21,22,23 > Name=gpu Type=kepler File=/dev/nvidia4 > CPUs=8,9,10,11,12,13,14,15,24,25,26,27,28,29,30,31
In passing, you can use range notation for CPU indexes, and make it more compact: Name=gpu Type=kepler File=/dev/nvidia0 CPUs=[0-7,16-23] Name=gpu Type=kepler File=/dev/nvidia4 CPUs=[8-15,24-31] > but when we run a job with the second gpu allocated, > /sys/fs/cgroup/cpuset/slurm/…./cpuset.cpus reports that the job has been > allocated cpus from the first gpu’s set. It seems as if the CPU/GPU > affinity in gres.conf is being completely ignored. Slurmd.log doesn’t seem > to mention anything about it with maximum debug verbosity. You can try to use DebugFlags=CPU_Bind,gres in your slurm.conf for more details. > We have tried the following TaskPlugin settings: “task/affinity,task/cgroup” > and just “task/cgroup”. In both cases we have tried setting TaskPluginParam > to “Cpuset”. All of these configurations produced the same incorrect > results. We use this: SelectType=select/cons_res SelectTypeParameters=CR_CORE_MEMORY ProctrackType=proctrack/cgroup TaskPlugin=task/cgroup and for a 4-GPU node which has a gres.conf like this (don't ask, some vendors like their CPU ids alternating between sockets): NodeName=sh-114-03 name=gpu File=/dev/nvidia[0-1] CPUs=0,2,4,6,8,10,12,14,16,18 NodeName=sh-114-03 name=gpu File=/dev/nvidia[2-3] CPUs=1,3,5,7,9,11,13,15,17,19 we can submit 4 jobs using 1 GPU each, which end up getting a CPU id that matches the allocated GPU: $ sbatch --array=1-4 -p gpu -w sh-114-03 --gres=gpu:1 --wrap="sleep 100" Submitted batch job 2669681 $ scontrol -dd show job 2669681 | grep CPU_ID | sort Nodes=sh-114-03 CPU_IDs=0 Mem=12800 GRES_IDX=gpu(IDX:0) Nodes=sh-114-03 CPU_IDs=1 Mem=12800 GRES_IDX=gpu(IDX:2) Nodes=sh-114-03 CPU_IDs=2 Mem=12800 GRES_IDX=gpu(IDX:1) Nodes=sh-114-03 CPU_IDs=3 Mem=12800 GRES_IDX=gpu(IDX:3) How do you check which GPU your job has been allocated? Cheers, -- Kilian