Hi, > Am 18.05.2017 um 12:37 schrieb juanesteban.jime...@mdc-berlin.de: > > Ok, so I create a new queue, gpu.q, that only has that node, with the complex > value for the gpu. I removed the node from @allhosts so that the all.q and > interactive.q don’t use the node. I also modified the user list so that only > users authorized to use the GPU can use the node. > > But now I am told that this is not recommended. ??
Often there are several ways to implement certain settings. While I use different queues mainly to have different locations of the scratch directory on one and the same node (going to traditional disks with /scratch, one can get /ssd or /ramdisk too), others like to split the queues because of the purpose: interactive, batch or gpu jobs. Essentially it's personal taste. So with the GPU: you can attach a FORCED resource per exechost or or queue instance, and/or limit the access by xusers_list being assigned. Another option could be to put this policy in an RQS or JSV. In your case I'm a little bit lost, as you started the thread IIRC with the complex being attached to an ACL. Nevertheless, I think we are facing in your case two setup challenges: - Limit the access to certain nodes/queues. - Track the usage of the GPUs on these nodes, so that each job gets an unique one. As William mentions below: are these nodes exclusively reserved for dedicated users, or should other users be able to use them, but not the GPU? -- Reuti > Mfg, > Juan Jimenez > System Administrator, BIH HPC Cluster > MDC Berlin / IT-Dept. > Tel.: +49 30 9406 2800 > > On 17.05.17, 09:44, "William Hay" <w....@ucl.ac.uk> wrote: > > On Tue, May 16, 2017 at 08:07:15PM +0000, > juanesteban.jime...@mdc-berlin.de wrote: >> In our cluster we have one node with two Nvidia GPUs. I have been trying to >> figure out how to set them up as consumable resources tied to an ACL, but I >> can't get SGE to handle them correctly. It always says the resource is not >> available. >> >> Can someone walk me through the steps required to set this up correctly? The >> docs I have found are rather cryptic. > Assuming you want other people to be able to use the node but not the GPUs > I would think the process would be: > 1)Define the resource in the complex_values of a queue that exists only on > the node in question. > 2)Add the Grid Engine ACL to the queue. > 3)Ensure all resources shared between gpu and non-gpu jobs (including > slots/cpus) are defined on the > host rather than the queue. > > You might want to set up the prolog and epilog to twiddle the permissions > on the /dev/ files representing > the GPUs so only the job can access them to enforce access. > > > > William > > > _______________________________________________ > SGE-discuss mailing list > SGE-discuss@liv.ac.uk > https://arc.liv.ac.uk/mailman/listinfo/sge-discuss > _______________________________________________ SGE-discuss mailing list SGE-discuss@liv.ac.uk https://arc.liv.ac.uk/mailman/listinfo/sge-discuss