Dear Reuti

I've get it! I've was checking for slots or core definition.
The error was that it was define the h_vmem=7G in the "global" value of Extecution Hosts, in place of each compute node definition.. So, as my job was asking for 1G / cores, the limit of 7 slots.

Thank's a lot Reuti to let me check where i didn't !

Regards

 Le 24/10/2016 à 12:34, Reuti a écrit :
Hi,

Am 24.10.2016 um 19:15 schrieb Jerome <jer...@ibt.unam.mx>:

Dear all

I've install for a course a Rocks Cluster of 2 nodes, with SGE. Each node are a 
4 cores nodes.
I do a shutdown of a node, and so i have ready uniquely 4 cores:

$ qstat -f
queuename                      qtype resv/used/tot. load_avg arch   states
---------------------------------------------------------------------------------
all.q@compute-0-0.local        BIP   0/0/4          0.00     linux-x64
---------------------------------------------------------------------------------
all.q@compute-0-1.local        BIP   0/0/4          -NA-     linux-x64   au



But i come in a strange issue, that i can't explain yet:
My user submit a paralele job with 8 cores.
When i check my job state, in "qw" state, i've get back thios message:

$ qtsat j 58
../..

scheduling info:            queue instance "all.q@compute-0-1.local" dropped 
because it is temporarily not available
                           cannot run in PE "orte" because it only offers 7 
slots

The error message does not always reflect the correct cause and further 
investigation is necessary. Nevertheless, with only 4 slots available, the job 
can't start as you request 8.


If i power on the second node, the message is ths same:

$ qstat -f
queuename                      qtype resv/used/tot. load_avg arch   states
---------------------------------------------------------------------------------
all.q@compute-0-0.local        BIP   0/0/4          0.00     linux-x64
---------------------------------------------------------------------------------
all.q@compute-0-1.local        BIP   0/0/4          0.10     linux-x64


$ qstat -j 58

../..

parallel environment:  orte range: 8
version:                    3
scheduling info:            cannot run in PE "orte" because it only offers 7 
slots

Now the job should start, and something else is blocking it, not the number of 
free slots.


I've search on all of the configuration of SGE. I do too the reinstalation of 
the 2 nodes. But the same message appears, that uniquely 7 slots free !

Did you request any resources for the job like memory? Any RQS in place? Do you use 
"job_load_adjustments" in the scheduler configuration?

-- Reuti



Someone can't get me some help?

Regards


--
-- Jérôme
On n'a jamais vu un aveugle dans un camp de nudistes.
      (Woody Allen)
_______________________________________________
users mailing list
users@gridengine.org
https://gridengine.org/mailman/listinfo/users




--
-- Jérôme
- Mon dévouement vous est acquéris.
- Acquis, acquis ! souffle une voix charitable.
- À qui ? Mais à tous, citoyens.
_______________________________________________
users mailing list
users@gridengine.org
https://gridengine.org/mailman/listinfo/users

Reply via email to