On Mon, Feb 13, 2017 at 02:26:18PM -0500, Michael Stauffer wrote:
SoGE 8.1.8
Hi,
I'm getting some queued jobs with scheduling info that includes this line
at the end:
cannot run in PE "unihost" because it only offers 0 slots
'unihost' is the only PE I use. When users request multiple slots, they use
'unihost':
... -binding linear:2 -pe unihost 2 ...
What happens is that these jobs aren't running when it otherwise seems like
they should be, or they sit waiting in the queue for a long time even when
the user has plenty of quota available within the queue they've requested,
and there are enough resources available on the queue's nodes (slots and
vram are consumables).
Any suggestions about how I might further understand this?
This *exact* problem has bitten me in the past. It seems to crop up
about every 3 years--long enough to remember it was a problem, and long
enough to forget just what the [censored] I did to fix it.
As I recall, it has little to do with actual PEs, but everything to do
with complexes and resource requests.
You might glean a bit more information by running "qsub -w p"
(or "-w e").
Take a look at these previous discussions:
http://gridengine.org/pipermail/users/2011-November/001932.html
http://comments.gmane.org/gmane.comp.clustering.opengridengine.user/1700
--
Jesse Becker (Contractor)
_______________________________________________
users mailing list
users@gridengine.org
https://gridengine.org/mailman/listinfo/users