Re: [gridengine users] Strange issue with one node

Jerome Mon, 24 Oct 2016 10:32:16 -0700

Dear Skylar.

I check this too, and all seems normal:


$ qconf -sp orte
pe_name            orte
slots              9999
user_lists         NONE
xuser_lists        NONE
start_proc_args    /bin/true
stop_proc_args     /bin/true
allocation_rule    $fill_up
control_slaves     TRUE
job_is_first_task  FALSE
urgency_slots      min
accounting_summary TRUE


$ qconf -sq all.q
qname                 all.q
hostlist              @allhosts
seq_no                0
load_thresholds       np_load_avg=1.75
suspend_thresholds    NONE
nsuspend              1
suspend_interval      00:05:00
priority              0
min_cpu_interval      00:05:00
processors            UNDEFINED
qtype                 BATCH INTERACTIVE
ckpt_list             NONE
pe_list               make mpi mpich orte thread
rerun                 FALSE
slots                 1,[compute-0-0.local=4],[compute-0-1.local=4]
tmpdir                /tmp
shell                 /bin/csh
prolog                NONE
epilog                NONE
shell_start_mode      unix_behavior
starter_method        NONE
suspend_method        NONE
resume_method         NONE
terminate_method      NONE
notify                00:00:60
owner_list            NONE
user_lists            NONE
xuser_lists           NONE
subordinate_list      NONE
complex_values        NONE
projects              NONE
xprojects             NONE
calendar              NONE
initial_state         default
s_rt                  INFINITY
h_rt                  INFINITY
s_cpu                 INFINITY
h_cpu                 INFINITY
s_fsize               INFINITY
h_fsize               INFINITY
s_data                INFINITY
h_data                INFINITY
s_stack               INFINITY
h_stack               INFINITY
s_core                INFINITY
h_core                INFINITY
s_rss                 INFINITY
h_rss                 INFINITY
s_vmem                INFINITY
h_vmem                INFINITY


Thank's

Regards


Le 24/10/2016 à 12:20, Skylar Thompson a écrit :

On Mon, Oct 24, 2016 at 12:15:41PM -0500, Jerome wrote:

Dear all

I've install for a course a Rocks Cluster of 2 nodes, with SGE. Each
node are a 4 cores nodes.
I do a shutdown of a node, and so i have ready uniquely 4 cores:

$ qstat -f
queuename                      qtype resv/used/tot. load_avg arch
   states
---------------------------------------------------------------------------------
all.q@compute-0-0.local        BIP   0/0/4          0.00     linux-x64
---------------------------------------------------------------------------------
all.q@compute-0-1.local        BIP   0/0/4          -NA-     linux-x64
   au



But i come in a strange issue, that i can't explain yet:
My user submit a paralele job with 8 cores.
When i check my job state, in "qw" state, i've get back thios message:

$ qtsat j 58
  ../..

  scheduling info:            queue instance "all.q@compute-0-1.local"
dropped because it is temporarily not available
                             cannot run in PE "orte" because it only
offers 7 slots

If i power on the second node, the message is ths same:

$ qstat -f
queuename                      qtype resv/used/tot. load_avg arch
   states
---------------------------------------------------------------------------------
all.q@compute-0-0.local        BIP   0/0/4          0.00     linux-x64
---------------------------------------------------------------------------------
all.q@compute-0-1.local        BIP   0/0/4          0.10     linux-x64


$ qstat -j 58

../..

parallel environment:  orte range: 8
version:                    3
scheduling info:            cannot run in PE "orte" because it only
offers 7 slots


I've search on all of the configuration of SGE. I do too the
reinstalation of the 2 nodes. But the same message appears, that
uniquely 7 slots free !

Someone can't get me some help?


What do "qconf -sp orte" and "qconf -sq all.q" report?



--
-- Jérôme
- Pourquoi buvez-vous?
- La question m'a déjà été posé monsieur le proviseur.
- Probablement par des gens qui vous aiment bien.
- Probablement. Claire me la posait trois fois par semaine: devait m'adorer
        (Michel Audiard)
_______________________________________________
users mailing list
users@gridengine.org
https://gridengine.org/mailman/listinfo/users

Re: [gridengine users] Strange issue with one node

Reply via email to