Re: [slurm-users] Job dispatching policy

2019-04-24 Thread John Hearns
I would suggest that if those applications really are not possible with Slurm - then reserve a set of nodes for interactive use and disable the Slurm daemon on them. Direct users to those nodes. More constructively - maybe the list can help you get the X11 applications to run using Slurm. Could yo

Re: [slurm-users] Effect of PriorityMaxAge on job throughput

2019-04-24 Thread David Baker
Hello Michael, Thank you for your email and apologies for my tardy response. I'm still sorting out my mailbox after an Easter break. I've taken your comments on board and I'll see how I go with your suggestions. Best regards, David From: slurm-users on behalf o

[slurm-users] Limit concurrent gpu resources

2019-04-24 Thread Mike Cammilleri
Hi everyone, We have a single node with 8 gpus. Users often pile up lots of pending jobs and are using all 8 at the same time, but for a user who just wants to do a short run debug job and needs one of the gpus, they are having to wait too long for a gpu to free up. Is there a way with gres.con

Re: [slurm-users] Limit concurrent gpu resources

2019-04-24 Thread Renfro, Michael
We put a ‘gpu’ QOS on all our GPU partitions, and limit jobs per user to 8 (our GPU capacity) via MaxJobsPerUser. Extra jobs get blocked, allowing other users to queue jobs ahead of the extras. # sacctmgr show qos gpu format=name,maxjobspu Name MaxJobsPU -- - gpu

Re: [slurm-users] scontrol for a heterogenous job appears incorrect

2019-04-24 Thread Jeffrey R. Lang
Chris Upon further testing this morning I see the job is assigned two different jobid's, something I wasn't expecting. This lead me down the road of thinking the output was incorrect. Scontrol on a hetro job will show multi-jobids for the job. So, the output just wasn't what I was expecting.

Re: [slurm-users] Limit concurrent gpu resources

2019-04-24 Thread Prentice Bisbal
Here's how we handle this here: Create a separate partition named debug that also contains that node. Give the debug partition a very short timelimit, say 30 - 60 minutes. Long enough for debugging, but too short to do any real work. Make the priority of the debug partition much higher than t