Hello,

from a kernel/mechanism point of view, it is perfectly possible to 
restrict device access using cgroups. I use that on my current cluster, 
works really well (both for things like CPU cores and GPUs - you only 
see what you request, even using something like 'nvidia-smi').

Sadly, my current cluster isn't Grid Engine based :( and I have no idea 
if SoGE or UGE support doing so out of the box - I've never had to do 
that whilst still working with Grid Engine. Wouldn't be surprised if UGE 
can do it.

You could probably script something yourself - I know I made a custom 
suspend method once that used cgroups for non-MPI jobs.

Tina

On 14/08/2019 15:35, Andreas Haupt wrote:
> Hi Dj,
> 
> we do this by setting $CUDA_VISIBLE_DEVICES in a prolog script (and
> according to what has been requested by the job).
> 
> Preventing access to the 'wrong' gpu devices by "malicious jobs" is not
> that easy. An idea could be to e.g. play with device permissions.
> 
> Cheers,
> Andreas
> 
> On Wed, 2019-08-14 at 10:21 -0400, Dj Merrill wrote:
>> To date in our HPC Grid running Son of Grid Engine 8.1.9, we've had
>> single Nvidia GPU cards per compute node.  We are contemplating the
>> purchase of a single compute node that has multiple GPU cards in it, and
>> want to ensure that running jobs only have access to the GPU resources
>> they ask for, and don't take over all of the GPU cards in the system.
>>
>> We define gpu as a resource:
>> qconf -sc:
>> #name               shortcut   type      relop   requestable consumable
>> default  urgency
>> gpu                 gpu        INT       <=      YES         YES    0
>>      0
>>
>> We define GPU persistence mode and exclusive process on each node:
>> nvidia-smi -pm 1
>> nvidia-smi -c 3
>>
>> We set the number of GPUs in the host definition:
>> qconf -me (hostname)
>>
>> complex_values   gpu=1   for our existing nodes, and this setup has been
>> working fine for us.
>>
>> With the new system, we would set:
>> complex_values   gpu=4
>>
>>
>> If a job is submitted asking for one GPU, will it be limited to only
>> having access to a single GPU card on the system, or can it detect the
>> other cards and take up all four (and if so how do we prevent that)?
>>
>> Is there something like "cgroups" for gpus?
>>
>> Thanks,
>>
>> -Dj
>>
>>
>> _______________________________________________
>> users mailing list
>> users@gridengine.org
>> https://gridengine.org/mailman/listinfo/users
>>
>> _______________________________________________
>> users mailing list
>> users@gridengine.org
>> https://gridengine.org/mailman/listinfo/users

_______________________________________________
users mailing list
users@gridengine.org
https://gridengine.org/mailman/listinfo/users

Reply via email to