[slurm-users] Optimizing CPU socket affinities and NVLink

Matthew R. Baney via slurm-users Thu, 08 Aug 2024 10:01:59 -0700

Hello,

I've recently adopted setting AutoDetect=nvml in our GPU nodes' gres.conf
files to automatically populate Cores and Links for GPUs, which has been
working well.


I'm now wondering if I can prioritize having single GPU jobs scheduled on
NVLink pairs (these are PCIe A6000s) where one of the GPUs in the pair is
already running a single GPU job, assuming the CPU socket with affinity has
enough cores to handle the job. We have some users wanting to run single
GPU jobs and others wanting to run dual GPU jobs, both on the same nodes,
so we would prefer not to configure each NVLink pair as a single GRES, for
better job throughput.

As is, I've observed that for a node with at least 4 GPUs and 2 sockets
(one NVLink pair per socket), Slurm will prioritize evening out core
allocation between the sockets. Once the second single GPU job is
submitted, one GPU in each NVLink pair is taken up and a subsequent dual
GPU job can still run, but doesn't have access to an NVLink pair.

We've also got a few nodes where single GPUs have failed, resulting in some
NVLink'd pairs and usually a single non-NVLink'd GPU (3 or 7 total GPUs).
It'd be ideal if single GPU jobs also got prioritized for scheduling on the
non-NVLink'd GPU in this case.

Is this possible?

All the best,
Matthew

-- 
Matthew Baney
Assistant Director of Computational Systems
mba...@umd.edu | (301) 405-6756
University of Maryland Institute for Advanced Computer Studies
3154 Brendan Iribe Center
8125 Paint Branch Dr.
College Park, MD 20742

-- 
slurm-users mailing list -- slurm-users@lists.schedmd.com
To unsubscribe send an email to slurm-users-le...@lists.schedmd.com

[slurm-users] Optimizing CPU socket affinities and NVLink

Reply via email to