Re: [gridengine users] jobs not running even though resource quotas not met

Reuti Fri, 21 Oct 2016 10:56:42 -0700

> Am 21.10.2016 um 17:25 schrieb Michael Stauffer <mgsta...@gmail.com>:
> 
> Maybe it would be good to tell the user not to submit into a queue at all but 
> request resources and SGE will select an appropriate queue for the job.
> 
> I have two main queues. all.q, which is a batch queue for newer compute 
> nodes, and basic.q which is a batch queue for older, slower nodes that are 
> used much less often.


Instead of submitting into the queues, you could define a RESTRING complex and 
attach it to each exechost. Maybe the name of the CPU, the code name or just 
"old" and "new". Users could than submit by requesting:

$ qsub -l CPU=old ...

would than be directed to the old ones (having the complex FORCED and attached 
only to the old nodes will behave like now, just that you have to request a 
resource instead of a queue - and you would need only one queue).

Even better: a BOOLEAN complex attached only to the old ones set to TRUE. Then 
the submission could use:

$ qsub -l old ...

as TRUE will be used then by default.


> I also have a separate queue for qlogin sessions (if I remember right, I 
> setup a separate qlogin queue a long time ago when I first set this system up 
> so I could have a time-limit on sessions).

Yes, that's feasible. As (only) this queue is set to interactive I assume, 
these sessions should always end up therein (unless a user uses "-now no"). 
"interactive" in SGE is more like "immediately".


> Would it make sense to have a resource that differentiates between these 
> queues that user's would request in order for SGE to choose the appropriate 
> queue, or leave it as I have it currently, in which all.q is the default, and 
> if a user wants to run on basic.q, they request it manually via qsub -q 
> option.

Personally I prefer the least amount of queues as possible, and to request 
resources for a given job.


> I'll probably be adding some queues soon that have different time limits, to 
> better corral long-running jobs. I know there's a mechanism for doing this, 
> but haven't looked into it yet. I imagine it' what you're suggesting here?

Yes. Although you need a second queue, the users will specify the expected 
runtime of their job:

$ qsub -l h_rt=2:15:00 ...

The users can request the maximum running time of their jobs, so if it's 
necessary they will end up in the "long.q" where h_rt is requested and set to a 
higher value than given in the all.q. Nevertheless also short jobs can end up 
in long.q, in case the nodes are idle otherwise.

It might be necessary to limit the number of slots per exechost (in case a host 
belongs to both queues), to avoid that a node gets oversubscribed (either by 
slots being set per exechost to the number of installed cores or by an RQS, I 
set it per exechost to keep RQS resp. `qquota` short).

-- Reuti


> > and I hadn't looked carefully enough to notice that. So now I'm not sure 
> > about the couple other times I've seen this in the past, it might have been 
> > something like that.
> >
> > Skylar thanks for the qstat -w tip, I'll use that in the future.
> >
> > Reuti, if I were to adjust the setup not to use RQS, how would I limit 
> > users' resource usage?
> 
> It was only suggested as a test. I saw situations where a combinations of 
> consumables and limits in RQS blocks the scheduling completely and showing 
> something like "... offers only (-l none)."
> 
> In case you have to limit the usage per user you have to use them for sure.
> 
> OK thanks, I thought you maybe were suggesting there's another way to limit 
> resources by user.
> 
> -M


_______________________________________________
users mailing list
users@gridengine.org
https://gridengine.org/mailman/listinfo/users

Re: [gridengine users] jobs not running even though resource quotas not met

Reply via email to