Re: [SGE-discuss] GPUs as a resource

Reuti Fri, 19 May 2017 05:36:37 -0700

Hi,

> Am 18.05.2017 um 12:37 schrieb juanesteban.jime...@mdc-berlin.de:
> 
> Ok, so I create a new queue, gpu.q, that only has that node, with the complex 
> value for the gpu. I removed the node from @allhosts so that the all.q and 
> interactive.q don’t use the node. I also modified the user list so that only 
> users authorized to use the GPU can use the node.
> 
> But now I am told that this is not recommended. ??

Often there are several ways to implement certain settings. While I use 
different queues mainly to have different locations of the scratch directory on 
one and the same node (going to traditional disks with /scratch, one can get 
/ssd or /ramdisk too), others like to split the queues because of the purpose: 
interactive, batch or gpu jobs. Essentially it's personal taste.

So with the GPU: you can attach a FORCED resource per exechost or or queue 
instance, and/or limit the access by xusers_list being assigned. Another option 
could be to put this policy in an RQS or JSV.

In your case I'm a little bit lost, as you started the thread IIRC with the 
complex being attached to an ACL.

Nevertheless, I think we are facing in your case two setup challenges:

- Limit the access to certain nodes/queues.
- Track the usage of the GPUs on these nodes, so that each job gets an unique 
one.

As William mentions below: are these nodes exclusively reserved for dedicated 
users, or should other users be able to use them, but not the GPU?

-- Reuti

> Mfg,
> Juan Jimenez
> System Administrator, BIH HPC Cluster
> MDC Berlin / IT-Dept.
> Tel.: +49 30 9406 2800
> 
> On 17.05.17, 09:44, "William Hay" <w....@ucl.ac.uk> wrote:
> 
>    On Tue, May 16, 2017 at 08:07:15PM +0000, 
> juanesteban.jime...@mdc-berlin.de wrote:
>> In our cluster we have one node with two Nvidia GPUs. I have been trying to 
>> figure out how to set them up as consumable resources tied to an ACL, but I 
>> can't get SGE to handle them correctly. It always says the resource is not 
>> available.
>> 
>> Can someone walk me through the steps required to set this up correctly? The 
>> docs I have found are rather cryptic.
>    Assuming you want other people to be able to use the node but not the GPUs 
> I would think the process would be:
>    1)Define the resource in the complex_values of a queue that exists only on 
> the node in question.
>    2)Add the Grid Engine ACL to the queue.
>    3)Ensure all resources shared between gpu and non-gpu jobs (including 
> slots/cpus) are defined on the
>    host rather than the queue.
> 
>    You might want to set up the prolog and epilog to twiddle the permissions 
> on the /dev/ files representing
>    the GPUs so only the job can access them to enforce access.
> 
> 
> 
>    William
> 
> 
> _______________________________________________
> SGE-discuss mailing list
> SGE-discuss@liv.ac.uk
> https://arc.liv.ac.uk/mailman/listinfo/sge-discuss
> 

_______________________________________________
SGE-discuss mailing list
SGE-discuss@liv.ac.uk
https://arc.liv.ac.uk/mailman/listinfo/sge-discuss

Re: [SGE-discuss] GPUs as a resource

Reply via email to