Hi.  I'm trying to move from load-based to sequence based scheduling, and I
have a problem.  First, a little something about my setup:

I have two sets of machines - 176 'fast' cores in 16-core nodes, and 90
'slow' cores in 2-core nodes.  I have two corresponding queues - slow.q and
fast.q.  The queues are non-requestable.  fast.q looks at the @fast host
group, which contains only the names of the fast nodes, and slow.q looks at
the @slow host group, which contains only the names of the slow nodes.  In
fast.q, I have slots = 16 and processors = 16, while in slow.q I have slots
= 2 and processors = 2.  Finally, slow.q is seq_no 1 and fast.q is seq_no 2.

Here's the problem:  If I submit a 120 processor job (so it's too large to
fit on the slow cores), it still gets assigned to slow.q.  This in itself
is bad - I want such a job to go directly to fast.q.  Its gets worse though
- because there aren't enough machines in slow.q, the remaining 30 threads
end up on nodes in fast.q!  I don't understand how this second part is
possible.  I've done qstat -f, and my 'fast' compute nodes definitely
aren't listed as being members of slow.q.

Any suggestions?  Thank you.
_______________________________________________
users mailing list
[email protected]
https://gridengine.org/mailman/listinfo/users

Reply via email to