On Mon, Feb 13, 2017 at 03:52:20PM +0000, Mark Dixon wrote:
> Hi,
> 
> I've been playing with allocating GPUs using gridengine and am wondering if
> I'm trying to make it too complicated.
> 
> We have some 24 core, 128G RAM machines, each with two K80 GPU cards in
> them. I have a little client/server program that allocates named cards to
> jobs (via a starter method and the handy job gid).

We tweak the permissions on the device nodes from a privileged prolog
but otherwise I suspect we're doing something similar.  One thing to watch out
for is that unless you disable it the device driver can change the permissions
on the device nodes behind your back.


> 
> What's left is the most important question: how do users request these
> resources?
> 
> I'm worried that, if I ask them to specify all the job's resources, a
> mouthful like "-pe smp 12 -l h_rt=1:0:0,h_vmem=64G,k80=1", just to get one
> card, could all too easily result in a K80 sitting idle if the numbers are
> tweaked a little.
> 
> Instead, I'm wondering if I should encourage users to request a number of
> cards and then we allocate a proportion of cpu and memory based on that (via
> per-slot complex, JSV and starter method).

Around here our examples put the options in the script after #$ rather
than on the command line.  That makes things a lot more readable.  We
save jobscripts for support purposes so having the requested options 
in there is helpful (interactive jobs with qrsh etc excepted obviously).
> 
> Is that too simplistic, or would it be a welcome relief? What does the qsub
> command line look like at other sites for requesting GPUs?

We have separate requests for memory, gpus ,local scratch space, etc with 
sensible defaults.   If someone did use the command line it could end up 
looking quite like the example you give.

William

Attachment: signature.asc
Description: Digital signature

_______________________________________________
users mailing list
users@gridengine.org
https://gridengine.org/mailman/listinfo/users

Reply via email to