just 1 gpu, without them going to pending (until all gpus are used up).
Rob
From: slurm-users on behalf of Groner,
Rob
Sent: Thursday, November 17, 2022 10:08 AM
To: Slurm User Community List
Subject: Re: [slurm-users] NVIDIA MIG question
No, I can't s
The first 2 go fine, but any after that go to pending, even though there should
be 4 available (according to sinfo output)
Rob
From: slurm-users on behalf of Yair
Yarom
Sent: Thursday, November 17, 2022 8:19 AM
To: Slurm User Community List
Subject: Re: [slurm-us
0 --account=1gc5gb
> --partition=sla-prio
> salloc: Job allocation 5015 has been revoked.
> salloc: error: Job submit/allocate failed: Requested node configuration is
> not available
>
>
> Rob
>
> --
> *From:* slurm-users on behalf of
> Ya
ration is not
available
Rob
From: slurm-users on behalf of Yair
Yarom
Sent: Wednesday, November 16, 2022 3:48 AM
To: Slurm User Community List
Subject: Re: [slurm-users] NVIDIA MIG question
You don't often get email from ir...@cs.huji.ac.il. Learn wh
Hi,
>From what we observed, Slurm sees the MIGs each as a distinct gres/gpu. So
you can have 14 jobs each using a different MIG.
However (unless something has changed in the past year), due to nvidia
limitations, a single process can't access more than one MIG simultaneously
(this is unrelated to
Hi Rob,
Yes, those questions make sense. From what I understand, MIG should
essentially split the GPU so that they behave as separate cards. Hence
two different users should be able to use two different MIG instances at
the same time and also a single job could use all 14 instances. The
resu
We have successfully used the nvidia-smi tool to take the 2 A100's in a node
and split them into multiple GPU devices. In one case, we split the 2 GPUS
into 7 MIG devices each, so 14 in that node total, and in the other case, we
split the 2 GPUs into 2 MIG devices each, so 4 total in the node.