Dear Skylar.
I check this too, and all seems normal:
$ qconf -sp orte
pe_name orte
slots 9999
user_lists NONE
xuser_lists NONE
start_proc_args /bin/true
stop_proc_args /bin/true
allocation_rule $fill_up
control_slaves TRUE
job_is_first_task FALSE
urgency_slots min
accounting_summary TRUE
$ qconf -sq all.q
qname all.q
hostlist @allhosts
seq_no 0
load_thresholds np_load_avg=1.75
suspend_thresholds NONE
nsuspend 1
suspend_interval 00:05:00
priority 0
min_cpu_interval 00:05:00
processors UNDEFINED
qtype BATCH INTERACTIVE
ckpt_list NONE
pe_list make mpi mpich orte thread
rerun FALSE
slots 1,[compute-0-0.local=4],[compute-0-1.local=4]
tmpdir /tmp
shell /bin/csh
prolog NONE
epilog NONE
shell_start_mode unix_behavior
starter_method NONE
suspend_method NONE
resume_method NONE
terminate_method NONE
notify 00:00:60
owner_list NONE
user_lists NONE
xuser_lists NONE
subordinate_list NONE
complex_values NONE
projects NONE
xprojects NONE
calendar NONE
initial_state default
s_rt INFINITY
h_rt INFINITY
s_cpu INFINITY
h_cpu INFINITY
s_fsize INFINITY
h_fsize INFINITY
s_data INFINITY
h_data INFINITY
s_stack INFINITY
h_stack INFINITY
s_core INFINITY
h_core INFINITY
s_rss INFINITY
h_rss INFINITY
s_vmem INFINITY
h_vmem INFINITY
Thank's
Regards
Le 24/10/2016 à 12:20, Skylar Thompson a écrit :
On Mon, Oct 24, 2016 at 12:15:41PM -0500, Jerome wrote:
Dear all
I've install for a course a Rocks Cluster of 2 nodes, with SGE. Each
node are a 4 cores nodes.
I do a shutdown of a node, and so i have ready uniquely 4 cores:
$ qstat -f
queuename qtype resv/used/tot. load_avg arch
states
---------------------------------------------------------------------------------
all.q@compute-0-0.local BIP 0/0/4 0.00 linux-x64
---------------------------------------------------------------------------------
all.q@compute-0-1.local BIP 0/0/4 -NA- linux-x64
au
But i come in a strange issue, that i can't explain yet:
My user submit a paralele job with 8 cores.
When i check my job state, in "qw" state, i've get back thios message:
$ qtsat j 58
../..
scheduling info: queue instance "all.q@compute-0-1.local"
dropped because it is temporarily not available
cannot run in PE "orte" because it only
offers 7 slots
If i power on the second node, the message is ths same:
$ qstat -f
queuename qtype resv/used/tot. load_avg arch
states
---------------------------------------------------------------------------------
all.q@compute-0-0.local BIP 0/0/4 0.00 linux-x64
---------------------------------------------------------------------------------
all.q@compute-0-1.local BIP 0/0/4 0.10 linux-x64
$ qstat -j 58
../..
parallel environment: orte range: 8
version: 3
scheduling info: cannot run in PE "orte" because it only
offers 7 slots
I've search on all of the configuration of SGE. I do too the
reinstalation of the 2 nodes. But the same message appears, that
uniquely 7 slots free !
Someone can't get me some help?
What do "qconf -sp orte" and "qconf -sq all.q" report?
--
-- Jérôme
- Pourquoi buvez-vous?
- La question m'a déjà été posé monsieur le proviseur.
- Probablement par des gens qui vous aiment bien.
- Probablement. Claire me la posait trois fois par semaine: devait m'adorer
(Michel Audiard)
_______________________________________________
users mailing list
users@gridengine.org
https://gridengine.org/mailman/listinfo/users