We have the exact same request for our GPUS that are not A100 and we have developed a lua plugin for our needs (The new slurm version will also allow the 22.XX). Bu tfor earlier version:
 * https://github.com/basvandervlies/surf_slurm_mps



On 03/04/2022 23:19, Kamil Wilczek wrote:
Hello!

I am an administrator of a GPU cluster (Slurm version 19.05.5).

Could someone help me a little bit and explain if a single
GPU can be shared between multiple users? My experience and
documentation tells me that it is not possible. But even after
some time Slurm is still a beast to me and I find myself
struggling :)

* I setup the cluster to assign GPUs on multi-GPU servers
   to different users using GRES. This works fine and several
   users can work on a multi-GPU machine (--gres=gpu:N/--gpu:N).

* But sometimes I have requests to allow a group of students
   to work simultaneously, interactively on a small partition,
   where there is more users than GPUs. So I thought that maybe
   an MPS is a solutions, but the docs says that MPS is a way
   to run multiple jobs of *the same* user on a single GPU.
   When another user is requesting a GPU by MPS, the job is enqueued
   and waiting for the first users' MPS server to finish.
   So, this is not a solution for a multi-user, simultaneous/parallel
   environment, right?

Is there a way to share a GPU between multiple users?
The requirement is, say:

* 16 users working interactively, simultaneously
* 4 GPUs partition

Kind Regards

--
Bas van der Vlies
| HPCV Supercomputing | Internal Services | SURF | https://userinfo.surfsara.nl |
| Science Park 140 | 1098 XG Amsterdam | Phone: +31208001300 |
|  bas.vandervl...@surf.nl

Attachment: smime.p7s
Description: S/MIME Cryptographic Signature

Reply via email to