Hi,

> Am 24.10.2016 um 19:15 schrieb Jerome <jer...@ibt.unam.mx>:
> 
> Dear all
> 
> I've install for a course a Rocks Cluster of 2 nodes, with SGE. Each node are 
> a 4 cores nodes.
> I do a shutdown of a node, and so i have ready uniquely 4 cores:
> 
> $ qstat -f
> queuename                      qtype resv/used/tot. load_avg arch   states
> ---------------------------------------------------------------------------------
> all.q@compute-0-0.local        BIP   0/0/4          0.00     linux-x64
> ---------------------------------------------------------------------------------
> all.q@compute-0-1.local        BIP   0/0/4          -NA-     linux-x64   au
> 
> 
> 
> But i come in a strange issue, that i can't explain yet:
> My user submit a paralele job with 8 cores.
> When i check my job state, in "qw" state, i've get back thios message:
> 
> $ qtsat j 58
> ../..
> 
> scheduling info:            queue instance "all.q@compute-0-1.local" dropped 
> because it is temporarily not available
>                            cannot run in PE "orte" because it only offers 7 
> slots

The error message does not always reflect the correct cause and further 
investigation is necessary. Nevertheless, with only 4 slots available, the job 
can't start as you request 8.


> If i power on the second node, the message is ths same:
> 
> $ qstat -f
> queuename                      qtype resv/used/tot. load_avg arch   states
> ---------------------------------------------------------------------------------
> all.q@compute-0-0.local        BIP   0/0/4          0.00     linux-x64
> ---------------------------------------------------------------------------------
> all.q@compute-0-1.local        BIP   0/0/4          0.10     linux-x64
> 
> 
> $ qstat -j 58
> 
> ../..
> 
> parallel environment:  orte range: 8
> version:                    3
> scheduling info:            cannot run in PE "orte" because it only offers 7 
> slots

Now the job should start, and something else is blocking it, not the number of 
free slots.


> I've search on all of the configuration of SGE. I do too the reinstalation of 
> the 2 nodes. But the same message appears, that uniquely 7 slots free !

Did you request any resources for the job like memory? Any RQS in place? Do you 
use "job_load_adjustments" in the scheduler configuration?

-- Reuti


> 
> Someone can't get me some help?
> 
> Regards
> 
> 
> -- 
> -- Jérôme
> On n'a jamais vu un aveugle dans un camp de nudistes.
>       (Woody Allen)
> _______________________________________________
> users mailing list
> users@gridengine.org
> https://gridengine.org/mailman/listinfo/users
> 


_______________________________________________
users mailing list
users@gridengine.org
https://gridengine.org/mailman/listinfo/users

Reply via email to