Hi Reuti, I configured OpenMPI to support SGE tight integration and used the defined below PE for submitting the job:
[16:36][eg@moe:~]$ qconf -sp fill_up pe_name fill_up slots 80 user_lists NONE xuser_lists NONE start_proc_args /bin/true stop_proc_args /bin/true allocation_rule $fill_up control_slaves TRUE job_is_first_task FALSE urgency_slots min accounting_summary FALSE Here are the allocation info retrieved from `qstat -g t` for the related job: --------------------------------------------------------------------------------- smp...@barney.fft BIP 0/1/4 0.70 lx-amd64 hc:num_proc=0 hl:mem_free=31.215G hl:mem_used=280.996M hc:mem_available=1.715G 1296 0.54786 semi_direc jj r 04/03/2012 16:43:49 1 --------------------------------------------------------------------------------- smp...@carl.fft BIP 0/1/4 0.69 lx-amd64 hc:num_proc=0 hl:mem_free=30.764G hl:mem_used=742.805M hc:mem_available=1.715G 1296 0.54786 semi_direc jj r 04/03/2012 16:43:49 1 --------------------------------------------------------------------------------- smp...@charlie.fft BIP 0/2/8 0.57 lx-amd64 hc:num_proc=0 hl:mem_free=62.234G hl:mem_used=836.797M hc:mem_available=4.018G 1296 0.54786 semi_direc jj r 04/03/2012 16:43:49 2 --------------------------------------------------------------------------------- Sge reports whatr pls_gridengine_report does, i.e. what was reserved. But here is the ouput of the current job (after started by openmpi): [charlie:05294] ras:gridengine: JOB_ID: 1296 [charlie:05294] ras:gridengine: PE_HOSTFILE: /opt/sge/default/spool/charlie/active_jobs/1296.1/pe_hostfile [charlie:05294] ras:gridengine: charlie.fft: PE_HOSTFILE shows slots=2 [charlie:05294] ras:gridengine: barney.fft: PE_HOSTFILE shows slots=1 [charlie:05294] ras:gridengine: carl.fft: PE_HOSTFILE shows slots=1 ====================== ALLOCATED NODES ====================== Data for node: Name: charlie Launch id: -1 Arch: ffc91200 State: 2 Num boards: 1 Num sockets/board: 2 Num cores/socket: 4 Daemon: [[54347,0],0] Daemon launched: True Num slots: 2 Slots in use: 0 Num slots allocated: 2 Max slots: 0 Username on node: NULL Num procs: 0 Next node_rank: 0 Data for node: Name: barney.fft Launch id: -1 Arch: 0 State: 2 Num boards: 1 Num sockets/board: 2 Num cores/socket: 4 Daemon: Not defined Daemon launched: False Num slots: 1 Slots in use: 0 Num slots allocated: 1 Max slots: 0 Username on node: NULL Num procs: 0 Next node_rank: 0 Data for node: Name: carl.fft Launch id: -1 Arch: 0 State: 2 Num boards: 1 Num sockets/board: 2 Num cores/socket: 4 Daemon: Not defined Daemon launched: False Num slots: 1 Slots in use: 0 Num slots allocated: 1 Max slots: 0 Username on node: NULL Num procs: 0 Next node_rank: 0 ================================================================= Map generated by mapping policy: 0200 Npernode: 0 Oversubscribe allowed: TRUE CPU Lists: FALSE Num new daemons: 2 New daemon starting vpid 1 Num nodes: 3 Data for node: Name: charlie Launch id: -1 Arch: ffc91200 State: 2 Num boards: 1 Num sockets/board: 2 Num cores/socket: 4 Daemon: [[54347,0],0] Daemon launched: True Num slots: 2 Slots in use: 2 Num slots allocated: 2 Max slots: 0 Username on node: NULL Num procs: 2 Next node_rank: 2 Data for proc: [[54347,1],0] Pid: 0 Local rank: 0 Node rank: 0 State: 0 App_context: 0 Slot list: NULL Data for proc: [[54347,1],3] Pid: 0 Local rank: 1 Node rank: 1 State: 0 App_context: 0 Slot list: NULL Data for node: Name: barney.fft Launch id: -1 Arch: 0 State: 2 Num boards: 1 Num sockets/board: 2 Num cores/socket: 4 Daemon: [[54347,0],1] Daemon launched: False Num slots: 1 Slots in use: 1 Num slots allocated: 1 Max slots: 0 Username on node: NULL Num procs: 1 Next node_rank: 1 Data for proc: [[54347,1],1] Pid: 0 Local rank: 0 Node rank: 0 State: 0 App_context: 0 Slot list: NULL Data for node: Name: carl.fft Launch id: -1 Arch: 0 State: 2 Num boards: 1 Num sockets/board: 2 Num cores/socket: 4 Daemon: [[54347,0],2] Daemon launched: False Num slots: 1 Slots in use: 1 Num slots allocated: 1 Max slots: 0 Username on node: NULL Num procs: 1 Next node_rank: 1 Data for proc: [[54347,1],2] Pid: 0 Local rank: 0 Node rank: 0 State: 0 App_context: 0 Slot list: NULL Regards, Eloi -----Original Message----- From: users-boun...@open-mpi.org [mailto:users-boun...@open-mpi.org] On Behalf Of Reuti Sent: mardi 3 avril 2012 16:24 To: Open MPI Users Subject: Re: [OMPI users] sge tight intregration leads to bad allocation Hi, Am 03.04.2012 um 16:12 schrieb Eloi Gaudry: > Thanks for your feedback. > No, this is the other way around, the "reserved" slots on all nodes are ok > but the "used" slots are different. > > Basically, I'm using SGE to schedule and book resources for a distributed > job. When the job is finally launched, it uses a different allocation than > the one that was reported by pls_gridengine_info. > > pls_grid_engine_info report states that 3 nodes were booked: barney (1 slot), > carl (1 slot) and charlie (2 slots). This booking was done by sge depending > on the memory requirements of the job (among others). > > When orterun starts the jobs (i.e. when Sge finally start the scheduled job), > it uses 3 nodes but the first one (barney: 2 slots instead of 1) is > oversubscribed and the last one (carl: 1 slot instead of 2) is underused. you configured Open MPI to support SGE tight integration and used a PE for submitting the job? Can you please post the defintion of the PE. What was the allocation you saw in SGE's `qstat -g t ` for the job? -- Reuti > If you need further information, please let me know. > > Eloi > > From: users-boun...@open-mpi.org [mailto:users-boun...@open-mpi.org] On > Behalf Of Ralph Castain > Sent: mardi 3 avril 2012 15:58 > To: Open MPI Users > Subject: Re: [OMPI users] sge tight intregration leads to bad allocation > > I'm afraid there isn't enough info here to help. Are you saying you only > allocated one slot/node, so the two slots on charlie is in error? > > Sent from my iPad > > On Apr 3, 2012, at 6:23 AM, "Eloi Gaudry" <eloi.gau...@fft.be> wrote: > > Hi, > > I've observed a strange behavior during rank allocation on a distributed run > schedule and submitted using Sge (Son of Grid Egine 8.0.0d) and OpenMPI-1.4.4. > Briefly, there is a one-slot difference between allocated rank/slot for Sge > and OpenMPI. The issue here is that one node becomes oversubscribed at > runtime. > > Here is the output of the allocation done for gridengine: > > ====================== ALLOCATED NODES ====================== > > Data for node: Name: barney Launch id: -1 Arch: ffc91200 > State: 2 > Num boards: 1 Num sockets/board: 2 Num cores/socket: 2 > Daemon: [[22904,0],0] Daemon launched: True > Num slots: 1 Slots in use: 0 > Num slots allocated: 1 Max slots: 0 > Username on node: NULL > Num procs: 0 Next node_rank: 0 > Data for node: Name: carl.fft Launch id: -1 Arch: 0 > State: 2 > Num boards: 1 Num sockets/board: 2 Num cores/socket: 2 > Daemon: Not defined Daemon launched: False > Num slots: 1 Slots in use: 0 > Num slots allocated: 1 Max slots: 0 > Username on node: NULL > Num procs: 0 Next node_rank: 0 > Data for node: Name: charlie.fft Launch id: -1 > Arch: 0 State: 2 > Num boards: 1 Num sockets/board: 2 Num cores/socket: 2 > Daemon: Not defined Daemon launched: False > Num slots: 2 Slots in use: 0 > Num slots allocated: 2 Max slots: 0 > Username on node: NULL > Num procs: 0 Next node_rank: 0 > > > And here is the allocation finally used: > ================================================================= > > Map generated by mapping policy: 0200 > Npernode: 0 Oversubscribe allowed: TRUE CPU Lists: FALSE > Num new daemons: 2 New daemon starting vpid 1 > Num nodes: 3 > > Data for node: Name: barney Launch id: -1 Arch: ffc91200 > State: 2 > Num boards: 1 Num sockets/board: 2 Num cores/socket: 2 > Daemon: [[22904,0],0] Daemon launched: True > Num slots: 1 Slots in use: 2 > Num slots allocated: 1 Max slots: 0 > Username on node: NULL > Num procs: 2 Next node_rank: 2 > Data for proc: [[22904,1],0] > Pid: 0 Local rank: 0 Node rank: 0 > State: 0 App_context: 0 > Slot list: NULL > Data for proc: [[22904,1],3] > Pid: 0 Local rank: 1 Node rank: 1 > State: 0 App_context: 0 > Slot list: NULL > > Data for node: Name: carl.fft Launch id: -1 Arch: 0 > State: 2 > Num boards: 1 Num sockets/board: 2 Num cores/socket: 2 > Daemon: [[22904,0],1] Daemon launched: False > Num slots: 1 Slots in use: 1 > Num slots allocated: 1 Max slots: 0 > Username on node: NULL > Num procs: 1 Next node_rank: 1 > Data for proc: [[22904,1],1] > Pid: 0 Local rank: 0 Node rank: 0 > State: 0 App_context: 0 > Slot list: NULL > > Data for node: Name: charlie.fft Launch id: -1 > Arch: 0 State: 2 > Num boards: 1 Num sockets/board: 2 Num cores/socket: 2 > Daemon: [[22904,0],2] Daemon launched: False > Num slots: 2 Slots in use: 1 > Num slots allocated: 2 Max slots: 0 > Username on node: NULL > Num procs: 1 Next node_rank: 1 > Data for proc: [[22904,1],2] > Pid: 0 Local rank: 0 Node rank: 0 > State: 0 App_context: 0 > Slot list: NULL > > Has anyone already encounter the same behavior ? > Is there a simple fix than not using the tight integration mode between Sge > and OpenMPI ? > > Eloi > > _______________________________________________ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users > _______________________________________________ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users _______________________________________________ users mailing list us...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/users