Hello, from a kernel/mechanism point of view, it is perfectly possible to restrict device access using cgroups. I use that on my current cluster, works really well (both for things like CPU cores and GPUs - you only see what you request, even using something like 'nvidia-smi').
Sadly, my current cluster isn't Grid Engine based :( and I have no idea if SoGE or UGE support doing so out of the box - I've never had to do that whilst still working with Grid Engine. Wouldn't be surprised if UGE can do it. You could probably script something yourself - I know I made a custom suspend method once that used cgroups for non-MPI jobs. Tina On 14/08/2019 15:35, Andreas Haupt wrote: > Hi Dj, > > we do this by setting $CUDA_VISIBLE_DEVICES in a prolog script (and > according to what has been requested by the job). > > Preventing access to the 'wrong' gpu devices by "malicious jobs" is not > that easy. An idea could be to e.g. play with device permissions. > > Cheers, > Andreas > > On Wed, 2019-08-14 at 10:21 -0400, Dj Merrill wrote: >> To date in our HPC Grid running Son of Grid Engine 8.1.9, we've had >> single Nvidia GPU cards per compute node. We are contemplating the >> purchase of a single compute node that has multiple GPU cards in it, and >> want to ensure that running jobs only have access to the GPU resources >> they ask for, and don't take over all of the GPU cards in the system. >> >> We define gpu as a resource: >> qconf -sc: >> #name shortcut type relop requestable consumable >> default urgency >> gpu gpu INT <= YES YES 0 >> 0 >> >> We define GPU persistence mode and exclusive process on each node: >> nvidia-smi -pm 1 >> nvidia-smi -c 3 >> >> We set the number of GPUs in the host definition: >> qconf -me (hostname) >> >> complex_values gpu=1 for our existing nodes, and this setup has been >> working fine for us. >> >> With the new system, we would set: >> complex_values gpu=4 >> >> >> If a job is submitted asking for one GPU, will it be limited to only >> having access to a single GPU card on the system, or can it detect the >> other cards and take up all four (and if so how do we prevent that)? >> >> Is there something like "cgroups" for gpus? >> >> Thanks, >> >> -Dj >> >> >> _______________________________________________ >> users mailing list >> users@gridengine.org >> https://gridengine.org/mailman/listinfo/users >> >> _______________________________________________ >> users mailing list >> users@gridengine.org >> https://gridengine.org/mailman/listinfo/users _______________________________________________ users mailing list users@gridengine.org https://gridengine.org/mailman/listinfo/users