>> Here are the allocation info retrieved from `qstat -g t` for the related job: > > For me the output of `qstat -g t` shows MASTER and SLAVE entries but no > variables. Is there any wrapper defined for `qstat` to reformat the output > (or a ~/.sge_qstat defined)? > > [eg: ] sorry, i forgot about sge_qstat being defined. As I don't have any > slot available right now, I cannot relaunch the job to get the output updated.
Reuti, here is the output you asked two days ago. It was produced with another "bad" run for which 3 processes are running on nodes charlie and carl... but we should have only 2 processes on carl and 4 on charlie... Output from qstat -g t: ------------------------------------ queuename qtype resv/used/tot. load_avg arch states --------------------------------------------------------------------------------- smp...@carl.fft BIP 0/2/4 1.14 lx-amd64 hc:mem_available=1.715G 1391 0.57643 semi_green jj r 04/05/2012 15:41:04 SLAVE SLAVE --------------------------------------------------------------------------------- smp...@charlie.fft BIP 0/4/8 1.73 lx-amd64 hc:mem_available=4.018G 1391 0.57643 semi_green jj r 04/05/2012 15:41:04 MASTER SLAVE SLAVE SLAVE SLAVE Debug output from orterun: ------------------------------------ [charlie:08194] ras:gridengine: JOB_ID: 1391 [charlie:08194] ras:gridengine: PE_HOSTFILE: /opt/sge/default/spool/charlie/active_jobs/1391.1/pe_hostfile [charlie:08194] ras:gridengine: charlie.fft: PE_HOSTFILE shows slots=4 [charlie:08194] ras:gridengine: carl.fft: PE_HOSTFILE shows slots=2 ====================== ALLOCATED NODES ====================== Data for node: Name: charlie Launch id: -1 Arch: ffc91200 State: 2 Num boards: 1 Num sockets/board: 2 Num cores/socket: 4 Daemon: [[57575,0],0] Daemon launched: True Num slots: 4 Slots in use: 0 Num slots allocated: 4 Max slots: 0 Username on node: NULL Num procs: 0 Next node_rank: 0 Data for node: Name: carl.fft Launch id: -1 Arch: 0 State: 2 Num boards: 1 Num sockets/board: 2 Num cores/socket: 4 Daemon: Not defined Daemon launched: False Num slots: 2 Slots in use: 0 Num slots allocated: 2 Max slots: 0 Username on node: NULL Num procs: 0 Next node_rank: 0 ================================================================= Map generated by mapping policy: 0200 Npernode: 0 Oversubscribe allowed: TRUE CPU Lists: FALSE Num new daemons: 1 New daemon starting vpid 1 Num nodes: 2 Data for node: Name: charlie Launch id: -1 Arch: ffc91200 State: 2 Num boards: 1 Num sockets/board: 2 Num cores/socket: 4 Daemon: [[57575,0],0] Daemon launched: True Num slots: 4 Slots in use: 3 Num slots allocated: 4 Max slots: 0 Username on node: NULL Num procs: 3 Next node_rank: 3 Data for proc: [[57575,1],0] Pid: 0 Local rank: 0 Node rank: 0 State: 0 App_context: 0 Slot list: NULL Data for proc: [[57575,1],2] Pid: 0 Local rank: 1 Node rank: 1 State: 0 App_context: 0 Slot list: NULL Data for proc: [[57575,1],4] Pid: 0 Local rank: 2 Node rank: 2 State: 0 App_context: 0 Slot list: NULL Data for node: Name: carl.fft Launch id: -1 Arch: 0 State: 2 Num boards: 1 Num sockets/board: 2 Num cores/socket: 4 Daemon: [[57575,0],1] Daemon launched: False Num slots: 2 Slots in use: 3 Num slots allocated: 2 Max slots: 0 Username on node: NULL Num procs: 3 Next node_rank: 3 Data for proc: [[57575,1],1] Pid: 0 Local rank: 0 Node rank: 0 State: 0 App_context: 0 Slot list: NULL Data for proc: [[57575,1],3] Pid: 0 Local rank: 1 Node rank: 1 State: 0 App_context: 0 Slot list: NULL Data for proc: [[57575,1],5] Pid: 0 Local rank: 2 Node rank: 2 State: 0 App_context: 0 Slot list: NULL Regards, Eloi