Just released a new version of the plugin. Our cluster has been upgraded to 21.08.6 and the cgroups structure is different. Fixed in latest release: * Tested on 21.08 and 20.11
Regards > On 4 Apr 2022, at 09:20, Bas van der Vlies <bas.vandervl...@surf.nl> wrote: > > We have the exact same request for our GPUS that are not A100 and we have > developed a lua plugin for our needs (The new slurm version will also allow > the 22.XX). Bu tfor earlier version: > * https://github.com/basvandervlies/surf_slurm_mps > > > > On 03/04/2022 23:19, Kamil Wilczek wrote: >> Hello! >> I am an administrator of a GPU cluster (Slurm version 19.05.5). >> Could someone help me a little bit and explain if a single >> GPU can be shared between multiple users? My experience and >> documentation tells me that it is not possible. But even after >> some time Slurm is still a beast to me and I find myself >> struggling :) >> * I setup the cluster to assign GPUs on multi-GPU servers >> to different users using GRES. This works fine and several >> users can work on a multi-GPU machine (--gres=gpu:N/--gpu:N). >> * But sometimes I have requests to allow a group of students >> to work simultaneously, interactively on a small partition, >> where there is more users than GPUs. So I thought that maybe >> an MPS is a solutions, but the docs says that MPS is a way >> to run multiple jobs of *the same* user on a single GPU. >> When another user is requesting a GPU by MPS, the job is enqueued >> and waiting for the first users' MPS server to finish. >> So, this is not a solution for a multi-user, simultaneous/parallel >> environment, right? >> Is there a way to share a GPU between multiple users? >> The requirement is, say: >> * 16 users working interactively, simultaneously >> * 4 GPUs partition >> Kind Regards > > -- > Bas van der Vlies > | HPCV Supercomputing | Internal Services | SURF | > https://userinfo.surfsara.nl | > | Science Park 140 | 1098 XG Amsterdam | Phone: +31208001300 | > | bas.vandervl...@surf.nl