[slurm-users] Re: Can Not Use A Single GPU for Multiple Jobs

Brian Andrus via slurm-users Thu, 20 Jun 2024 10:48:57 -0700

Well, if I am reading this right, it makes sense.

Every job will need at least 1 core just to run and if there are only 4cores on the machine, one would expect a max of 4 jobs to run.


Brian Andrus

On 6/20/2024 5:24 AM, Arnuld via slurm-users wrote:

I have a machine with a quad-core CPU and an Nvidia GPU with 3500+cores. I want to run around 10 jobs in parallel on the GPU (mostlyare CUDA based jobs).
PROBLEM: Each job asks for only 100 shards (runs usually for a minuteor so), then I should be able to run 3500/100 = 35 jobs inparallel but slurm runs only 4 jobs in parallel keeping the rest inthe queue.
I have this in slurm.conf and gres.conf:

# GPU
GresTypes=gpu,shard
# COMPUTE NODES
PartitionName=pzero Nodes=ALL Default=YES MaxTime=INFINITE State=UP`
PartitionName=pgpu Nodes=hostgpu MaxTime=INFINITE State=UP
NodeName=hostgpu NodeAddr=x.x.x.x Gres=gpu:gtx_1080_ti:1,shard:3500CPUs=4 Boards=1 SocketsPerBoard=1 CoresPerSocket=4 ThreadsPerCore=1RealMemory=64255 State=UNKNOWN
----------------------
Name=gpu Type=gtx_1080_ti File=/dev/nvidia0 Count=1
Name=shard Count=3500  File=/dev/nvidia0


--
slurm-users mailing list -- slurm-users@lists.schedmd.com
To unsubscribe send an email to slurm-users-le...@lists.schedmd.com

[slurm-users] Re: Can Not Use A Single GPU for Multiple Jobs

Reply via email to