Hi,

if by "share the GPU" you mean exclusive allocation to a single job then, I 
believe, you are missing cgroup configuration for isolating access to the GPU.


Below the relevant parts (I believe) of our configuration.


There also is a way of time- and space-slice GPUs but I guess you should get 
things setup without slicing.


I hope this helps.


Manuel


==> /etc/slurm/cgroup.conf <==
# https://bugs.schedmd.com/show_bug.cgi?id=3701
CgroupMountpoint="/sys/fs/cgroup"
CgroupAutomount=yes
AllowedDevicesFile="/etc/slurm/cgroup_allowed_devices_file.conf"

==> /etc/slurm/cgroup_allowed_devices_file.conf <==
/dev/null
/dev/urandom
/dev/zero
/dev/sda*
/dev/cpu/*/*
/dev/pts/*
/dev/nvidia*


==> /etc/slurm/slurm.conf <==

ProctrackType=proctrack/cgroup

# Memory is enforced via cgroups, so we should not do this here by [*]
#
# /etc/slurm/cgroup.conf: ConstrainRAMSpace=yes
#
# [*] https://bugs.schedmd.com/show_bug.cgi?id=5262
JobAcctGatherParams=NoOverMemoryKill

TaskPlugin=task/cgroup

JobAcctGatherType=jobacct_gather/cgroup


--
Dr. Manuel Holtgrewe, Dipl.-Inform.
Bioinformatician
Core Unit Bioinformatics – CUBI
Berlin Institute of Health / Max Delbrück Center for Molecular Medicine in the 
Helmholtz Association / Charité – Universitätsmedizin Berlin

Visiting Address: Invalidenstr. 80, 3rd Floor, Room 03 028, 10117 Berlin
Postal Address: Chariteplatz 1, 10117 Berlin

E-Mail: manuel.holtgr...@bihealth.de
Phone: +49 30 450 543 607
Fax: +49 30 450 7 543 901
Web: cubi.bihealth.org  www.bihealth.org  www.mdc-berlin.de  www.charite.de
________________________________
From: slurm-users <slurm-users-boun...@lists.schedmd.com> on behalf of Analabha 
Roy <hariseldo...@gmail.com>
Sent: Wednesday, February 1, 2023 6:12:40 PM
To: slurm-users@lists.schedmd.com
Subject: [ext] [slurm-users] Enforce gpu usage limits (with GRES?)

Hi,

I'm new to slurm, so I apologize in advance if my question seems basic.

I just purchased a single node 'cluster' consisting of one 64-core cpu and an 
nvidia rtx5k gpu (Turing architecture, I think). The vendor supplied it with 
ubuntu 20.04 and slurm-wlm 19.05.5. Now I'm trying to adjust the config to suit 
the needs of my department.

I'm trying to bone up on GRES scheduling by reading this manual 
page<https://slurm.schedmd.com/gres.html>, but am confused about some things.

My slurm.conf file has the following lines put in it by the vendor:

###################
# COMPUTE NODES
GresTypes=gpu
NodeName=shavak-DIT400TR-55L CPUs=64 SocketsPerBoard=2 CoresPerSocket=32 
ThreadsPerCore=1 RealMemory=95311 Gres=gpu:1
#PartitionName=debug Nodes=ALL Default=YES MaxTime=INFINITE State=UP

PartitionName=CPU Nodes=ALL Default=Yes MaxTime=INFINITE  State=UP

PartitionName=GPU Nodes=ALL Default=NO MaxTime=INFINITE  State=UP
#####################

So they created two partitions that are essentially identical. Secondly, they 
put just the following line in gres.conf:

###################
NodeName=shavak-DIT400TR-55L      Name=gpu        File=/dev/nvidia0
###################

That's all. However, this configuration does not appear to constrain anyone in 
any manner. As a regular user, I can still use srun or sbatch to start GPU jobs 
from the "CPU partition," and nvidia-smi says that a simple 
cupy<https://cupy.dev/> script that multiplies matrices and starts as an sbatch 
job in the CPU partition can access the gpu just fine. Note that the 
environment variable "CUDA_VISIBLE_DEVICES" does not appear to be set in any 
job step. I tested this by starting an interactive srun shell in both CPU and 
GPU partition and running ''echo $CUDA_VISIBLE_DEVICES" and got bupkis for both.


What I need to do is constrain jobs to using chunks of GPU Cores/RAM so that 
multiple jobs can share the GPU.

As I understand from the gres manpage, simply adding "AutoDetect=nvml" (NVML 
should be installed with the NVIDIA HPC SDK, right? I installed it with 
apt-get...) in gres.conf should allow Slurm to detect the GPU's internal 
specifications automatically. Is that all, or do I need to config an mps GRES 
as well? Will that succeed in jailing out the GPU from jobs that don't mention 
any gres parameters (perhaps by setting CUDA_VISIBLE_DEVICES), or is there any 
additional config for that? Do I really need that extra "GPU" partition that 
the vendor put in for any of this, or is there a way to bind GRES resources to 
a particular partition in such a way that simply launching jobs in that 
partition will be enough?

Thanks for your attention.
Regards
AR













--
Analabha Roy
Assistant Professor
Department of Physics<http://www.buruniv.ac.in/academics/department/physics>
The University of Burdwan<http://www.buruniv.ac.in/>
Golapbag Campus, Barddhaman 713104
West Bengal, India
Emails: dan...@utexas.edu<mailto:dan...@utexas.edu>, 
a...@phys.buruniv.ac.in<mailto:a...@phys.buruniv.ac.in>, 
hariseldo...@gmail.com<mailto:hariseldo...@gmail.com>
Webpage: http://www.ph.utexas.edu/~daneel/

Reply via email to