Re: [slurm-users] NVIDIA MIG question

2022-11-17 Thread Groner, Rob
just 1 gpu, without them going to pending (until all gpus are used up). Rob From: slurm-users on behalf of Groner, Rob Sent: Thursday, November 17, 2022 10:08 AM To: Slurm User Community List Subject: Re: [slurm-users] NVIDIA MIG question No, I can't s

Re: [slurm-users] NVIDIA MIG question

2022-11-17 Thread Groner, Rob
The first 2 go fine, but any after that go to pending, even though there should be 4 available (according to sinfo output) Rob From: slurm-users on behalf of Yair Yarom Sent: Thursday, November 17, 2022 8:19 AM To: Slurm User Community List Subject: Re: [slurm-us

Re: [slurm-users] NVIDIA MIG question

2022-11-17 Thread Yair Yarom
0 --account=1gc5gb > --partition=sla-prio > salloc: Job allocation 5015 has been revoked. > salloc: error: Job submit/allocate failed: Requested node configuration is > not available > > > Rob > > -- > *From:* slurm-users on behalf of > Ya

Re: [slurm-users] NVIDIA MIG question

2022-11-16 Thread Groner, Rob
ration is not available Rob From: slurm-users on behalf of Yair Yarom Sent: Wednesday, November 16, 2022 3:48 AM To: Slurm User Community List Subject: Re: [slurm-users] NVIDIA MIG question You don't often get email from ir...@cs.huji.ac.il. Learn wh

Re: [slurm-users] NVIDIA MIG question

2022-11-16 Thread Yair Yarom
Hi, >From what we observed, Slurm sees the MIGs each as a distinct gres/gpu. So you can have 14 jobs each using a different MIG. However (unless something has changed in the past year), due to nvidia limitations, a single process can't access more than one MIG simultaneously (this is unrelated to

Re: [slurm-users] NVIDIA MIG question

2022-11-15 Thread Laurence
Hi Rob, Yes, those questions make sense. From what I understand, MIG should essentially split the GPU so that they behave as separate cards. Hence two different users should be able to use two different MIG instances at the same time and also a single job could use all 14 instances. The resu

[slurm-users] NVIDIA MIG question

2022-11-15 Thread Groner, Rob
We have successfully used the nvidia-smi tool to take the 2 A100's in a node and split them into multiple GPU devices. In one case, we split the 2 GPUS into 7 MIG devices each, so 14 in that node total, and in the other case, we split the 2 GPUs into 2 MIG devices each, so 4 total in the node.