On Mon, Oct 24, 2016 at 12:15:41PM -0500, Jerome wrote:
> Dear all
> 
> I've install for a course a Rocks Cluster of 2 nodes, with SGE. Each 
> node are a 4 cores nodes.
> I do a shutdown of a node, and so i have ready uniquely 4 cores:
> 
> $ qstat -f
> queuename                      qtype resv/used/tot. load_avg arch 
>    states
> ---------------------------------------------------------------------------------
> all.q@compute-0-0.local        BIP   0/0/4          0.00     linux-x64
> ---------------------------------------------------------------------------------
> all.q@compute-0-1.local        BIP   0/0/4          -NA-     linux-x64 
>    au
> 
> 
> 
> But i come in a strange issue, that i can't explain yet:
> My user submit a paralele job with 8 cores.
> When i check my job state, in "qw" state, i've get back thios message:
> 
> $ qtsat j 58
>   ../..
> 
>   scheduling info:            queue instance "all.q@compute-0-1.local" 
> dropped because it is temporarily not available
>                              cannot run in PE "orte" because it only 
> offers 7 slots
> 
> If i power on the second node, the message is ths same:
> 
> $ qstat -f
> queuename                      qtype resv/used/tot. load_avg arch 
>    states
> ---------------------------------------------------------------------------------
> all.q@compute-0-0.local        BIP   0/0/4          0.00     linux-x64
> ---------------------------------------------------------------------------------
> all.q@compute-0-1.local        BIP   0/0/4          0.10     linux-x64
> 
> 
> $ qstat -j 58
> 
> ../..
> 
> parallel environment:  orte range: 8
> version:                    3
> scheduling info:            cannot run in PE "orte" because it only 
> offers 7 slots
> 
> 
> I've search on all of the configuration of SGE. I do too the 
> reinstalation of the 2 nodes. But the same message appears, that 
> uniquely 7 slots free !
> 
> Someone can't get me some help?

What do "qconf -sp orte" and "qconf -sq all.q" report?

-- 
-- Skylar Thompson (skyl...@u.washington.edu)
-- Genome Sciences Department, System Administrator
-- Foege Building S046, (206)-685-7354
-- University of Washington School of Medicine
_______________________________________________
users mailing list
users@gridengine.org
https://gridengine.org/mailman/listinfo/users

Reply via email to