The simplest is probably to just have a separate partition that will 
only allow job times of 1 hour or less.

This is how our Univa queues used to work, by overlapping the same hardware. 
Univa shows available "slots" to the users and we had a lot of confused users 
complaining about all those free slots (busy slots in the other queue) while 
their jobs sat on the queue and new users confused as to why their jobs were 
being killed after 4 hours. I was able to move the short/long behavior to job 
classes and use RQSes and have one queue.

While slurm isn't showing users unused resources I am concerned that going back 
to two queues (partitions) will cause user interaction and adoption problems.

         It all depends on what best suits the specific needs.

Is there a way to have one partition that holds aside a small percentage of 
resources for jobs with a runtime under 4 hours, i.e. jobs with long runtimes 
cannot tie up 100% of the resources at one time? Some kind of virtual partition 
that feeds into two other partitions based on runtime would also work. The goal 
is that users can continue to post jobs to one partition but the scheduler 
won't let 100% of the compute resources get tied up with mutli-week long jobs.

Thanks.
On 12/16/2019 2:29 PM, Ransom, Geoffrey M. wrote:

Hello
   I am looking into switching from Univa (sge) to slurm and am figuring out 
how to implement some of our usage policy in slurm.

We have a Univa queue which uses job classes and RQSes to limit jobs with a run 
time over 4 hours to only half the available slots (CPU cores) so some slots 
are always free for quick jobs. We don't want all of our resources tied up with 
multiweek jobs when someone has a batch of 1 hour jobs to run.

Is there a way to implement this in slurm? To have a partition which will hold 
some CPU/GPU resources aside for jobs with a short runtime.

What would be the preferred solution for this issue in a slurm world?

Reply via email to