Hi Dj, we do this by setting $CUDA_VISIBLE_DEVICES in a prolog script (and according to what has been requested by the job).
Preventing access to the 'wrong' gpu devices by "malicious jobs" is not that easy. An idea could be to e.g. play with device permissions. Cheers, Andreas On Wed, 2019-08-14 at 10:21 -0400, Dj Merrill wrote: > To date in our HPC Grid running Son of Grid Engine 8.1.9, we've had > single Nvidia GPU cards per compute node. We are contemplating the > purchase of a single compute node that has multiple GPU cards in it, and > want to ensure that running jobs only have access to the GPU resources > they ask for, and don't take over all of the GPU cards in the system. > > We define gpu as a resource: > qconf -sc: > #name shortcut type relop requestable consumable > default urgency > gpu gpu INT <= YES YES 0 > 0 > > We define GPU persistence mode and exclusive process on each node: > nvidia-smi -pm 1 > nvidia-smi -c 3 > > We set the number of GPUs in the host definition: > qconf -me (hostname) > > complex_values gpu=1 for our existing nodes, and this setup has been > working fine for us. > > With the new system, we would set: > complex_values gpu=4 > > > If a job is submitted asking for one GPU, will it be limited to only > having access to a single GPU card on the system, or can it detect the > other cards and take up all four (and if so how do we prevent that)? > > Is there something like "cgroups" for gpus? > > Thanks, > > -Dj > > > _______________________________________________ > users mailing list > users@gridengine.org > https://gridengine.org/mailman/listinfo/users -- | Andreas Haupt | E-Mail: andreas.ha...@desy.de | DESY Zeuthen | WWW: http://www-zeuthen.desy.de/~ahaupt | Platanenallee 6 | Phone: +49/33762/7-7359 | D-15738 Zeuthen | Fax: +49/33762/7-7216
smime.p7s
Description: S/MIME cryptographic signature
_______________________________________________ users mailing list users@gridengine.org https://gridengine.org/mailman/listinfo/users