Hi Dave,
On Wed, Oct 25, 2017 at 9:23 PM, Dave Sizer <[email protected]> wrote:
> For some reason, we are observing that the preferred CPUs defined in
> gres.conf for GPU devices are being ignored when running jobs. That is, in
> our gres.conf we have gpu resource lines, such as:
>
> Name=gpu Type=kepler File=/dev/nvidia0
> CPUs=0,1,2,3,4,5,6,7,16,17,18,19,20,21,22,23
> Name=gpu Type=kepler File=/dev/nvidia4
> CPUs=8,9,10,11,12,13,14,15,24,25,26,27,28,29,30,31
In passing, you can use range notation for CPU indexes, and make it
more compact:
Name=gpu Type=kepler File=/dev/nvidia0 CPUs=[0-7,16-23]
Name=gpu Type=kepler File=/dev/nvidia4 CPUs=[8-15,24-31]
> but when we run a job with the second gpu allocated,
> /sys/fs/cgroup/cpuset/slurm/…./cpuset.cpus reports that the job has been
> allocated cpus from the first gpu’s set. It seems as if the CPU/GPU
> affinity in gres.conf is being completely ignored. Slurmd.log doesn’t seem
> to mention anything about it with maximum debug verbosity.
You can try to use DebugFlags=CPU_Bind,gres in your slurm.conf for more details.
> We have tried the following TaskPlugin settings: “task/affinity,task/cgroup”
> and just “task/cgroup”. In both cases we have tried setting TaskPluginParam
> to “Cpuset”. All of these configurations produced the same incorrect
> results.
We use this:
SelectType=select/cons_res
SelectTypeParameters=CR_CORE_MEMORY
ProctrackType=proctrack/cgroup
TaskPlugin=task/cgroup
and for a 4-GPU node which has a gres.conf like this (don't ask, some
vendors like their CPU ids alternating between sockets):
NodeName=sh-114-03 name=gpu File=/dev/nvidia[0-1]
CPUs=0,2,4,6,8,10,12,14,16,18
NodeName=sh-114-03 name=gpu File=/dev/nvidia[2-3]
CPUs=1,3,5,7,9,11,13,15,17,19
we can submit 4 jobs using 1 GPU each, which end up getting a CPU id
that matches the allocated GPU:
$ sbatch --array=1-4 -p gpu -w sh-114-03 --gres=gpu:1 --wrap="sleep 100"
Submitted batch job 2669681
$ scontrol -dd show job 2669681 | grep CPU_ID | sort
Nodes=sh-114-03 CPU_IDs=0 Mem=12800 GRES_IDX=gpu(IDX:0)
Nodes=sh-114-03 CPU_IDs=1 Mem=12800 GRES_IDX=gpu(IDX:2)
Nodes=sh-114-03 CPU_IDs=2 Mem=12800 GRES_IDX=gpu(IDX:1)
Nodes=sh-114-03 CPU_IDs=3 Mem=12800 GRES_IDX=gpu(IDX:3)
How do you check which GPU your job has been allocated?
Cheers,
--
Kilian