Hi, I'm trying to understand RQS and set up several rules, but they don't
quite work.
I have several queues (4 queues A, B, C and all.q). I follow this
suggestion:
http://www.gridengine.info/2009/03/30/evading-quota-limits-when-resources-are-available/
So I have:
qconf -sq A has "subordinate_list all.q B C"
qconf -sq B has "subordinate_list all.q A C"
qconf -sq C has "subordinate_list all.q A B"
Each queue A, B and C also associates to system primary group A, B, C.
My goal for example, is to user in A group can only run in A queue, but can
only have 96 slots (8 12-core nodes) max, anything over that can run over
to all.q.
Same thing for B and C accordingly
My rules to limit the number of slot per queue are:
{
name Alimit
description limit number of slot to all users in C queue
enabled TRUE
limit users {*} queues A to slots=96
}
{
name Blimit
description limit number of slot to all users in C queue
enabled TRUE
limit users {*} queues B to slots=24
}
{
name Climit
description limit number of slot to all users in C queue
enabled TRUE
limit users {*} queues C to slots=12
}
So far this doesn't quite work if a user submit a lot of jobs. Meaning if A
submit 8 x 12 slots job, they all running in A queue, each job runs 12
slots in on host. All is good.
However, if A submit more than 8, say 9 or 10 of 12 slot jobs, sometimes
the new jobs will run 1 slot in A queue and 11 slots in all.q and it spans
onto 2 nodes.
A.q ends up with 96+ slots. This behavior stays the same on all 3 primary
queues.
Can someone give me some advice please? I don't quite understand the
behavior. If a limit is set for queue A, it should not have more than 96
slots taken correct?
_______________________________________________
users mailing list
[email protected]
https://gridengine.org/mailman/listinfo/users