Hi Dave,

On Wed, Oct 25, 2017 at 9:23 PM, Dave Sizer <dsi...@nvidia.com> wrote:
> For some reason, we are observing that the preferred CPUs defined in
> gres.conf for GPU devices are being ignored when running jobs.  That is, in
> our gres.conf we have gpu resource lines, such as:
>
> Name=gpu Type=kepler File=/dev/nvidia0
> CPUs=0,1,2,3,4,5,6,7,16,17,18,19,20,21,22,23
> Name=gpu Type=kepler File=/dev/nvidia4
> CPUs=8,9,10,11,12,13,14,15,24,25,26,27,28,29,30,31

In passing, you can use range notation for CPU indexes, and make it
more compact:

Name=gpu Type=kepler File=/dev/nvidia0 CPUs=[0-7,16-23]
Name=gpu Type=kepler File=/dev/nvidia4 CPUs=[8-15,24-31]

> but when we run a job with the second gpu allocated,
> /sys/fs/cgroup/cpuset/slurm/…./cpuset.cpus reports that the job has been
> allocated cpus from the first gpu’s set.  It seems as if the CPU/GPU
> affinity in gres.conf is being completely ignored.  Slurmd.log doesn’t seem
> to mention anything about it with maximum debug verbosity.

You can try to use DebugFlags=CPU_Bind,gres in your slurm.conf for more details.

> We have tried the following TaskPlugin settings: “task/affinity,task/cgroup”
> and just “task/cgroup”.  In both cases we have tried setting TaskPluginParam
> to “Cpuset”.  All of these configurations produced the same incorrect
> results.

We use this:

SelectType=select/cons_res
SelectTypeParameters=CR_CORE_MEMORY
ProctrackType=proctrack/cgroup
TaskPlugin=task/cgroup

and for a 4-GPU node which has a gres.conf like this (don't ask, some
vendors like their CPU ids alternating between sockets):

NodeName=sh-114-03   name=gpu    File=/dev/nvidia[0-1]
CPUs=0,2,4,6,8,10,12,14,16,18
NodeName=sh-114-03   name=gpu    File=/dev/nvidia[2-3]
CPUs=1,3,5,7,9,11,13,15,17,19

we can submit 4 jobs using 1 GPU each, which end up getting a CPU id
that matches the allocated GPU:

$ sbatch --array=1-4 -p gpu -w sh-114-03 --gres=gpu:1 --wrap="sleep 100"
Submitted batch job 2669681

$ scontrol -dd show job 2669681 | grep CPU_ID | sort
    Nodes=sh-114-03 CPU_IDs=0 Mem=12800 GRES_IDX=gpu(IDX:0)
    Nodes=sh-114-03 CPU_IDs=1 Mem=12800 GRES_IDX=gpu(IDX:2)
    Nodes=sh-114-03 CPU_IDs=2 Mem=12800 GRES_IDX=gpu(IDX:1)
    Nodes=sh-114-03 CPU_IDs=3 Mem=12800 GRES_IDX=gpu(IDX:3)

How do you check which GPU your job has been allocated?

Cheers,
-- 
Kilian

Reply via email to