Am 03.04.2012 um 17:24 schrieb Eloi Gaudry: > -----Original Message----- > From: users-boun...@open-mpi.org [mailto:users-boun...@open-mpi.org] On > Behalf Of Reuti > Sent: mardi 3 avril 2012 17:13 > To: Open MPI Users > Subject: Re: [OMPI users] sge tight intregration leads to bad allocation > > Am 03.04.2012 um 16:59 schrieb Eloi Gaudry: > >> Hi Reuti, >> >> I configured OpenMPI to support SGE tight integration and used the defined >> below PE for submitting the job: >> >> [16:36][eg@moe:~]$ qconf -sp fill_up >> pe_name fill_up >> slots 80 >> user_lists NONE >> xuser_lists NONE >> start_proc_args /bin/true >> stop_proc_args /bin/true >> allocation_rule $fill_up > > It should fill a host completely before moving to the next one with this > definition. > [eg: ] yes, and it should also make sure that all hard requirements are met. > Note that the allocation done by sge is correct here, this is what is finally > done by openmpi at startup that is different (and incorrect). > > >> control_slaves TRUE >> job_is_first_task FALSE >> urgency_slots min >> accounting_summary FALSE >> >> Here are the allocation info retrieved from `qstat -g t` for the related job: > > For me the output of `qstat -g t` shows MASTER and SLAVE entries but no > variables. Is there any wrapper defined for `qstat` to reformat the output > (or a ~/.sge_qstat defined)? > > [eg: ] sorry, i forgot about sge_qstat being defined. As I don't have any > slot available right now, I cannot relaunch the job to get the output updated. > > And why is "num_proc=0" output everywhere - was it redefined (usually it's a > load sensor set to the found cores in the machines and shoudn't be touched by > hand making it a consumable complex). > > [eg: ] my mistake i think, this was made a consumable complex so that we > could easily schedule multithread and parallel job on the cluster. I guess I > should define another complex (proc_available), make it consumable and > consume from this complex instead of touching the num_proc sensor one then...
No. Also a threaded job is a parallel one with allocation_rule $pe_slots, no custom complex necessary. Often such a PE is called "smp". So, for now we can't solve the initial issue. -- Reuti > > -- Reuti > > >> --------------------------------------------------------------------------------- >> smp...@barney.fft BIP 0/1/4 0.70 lx-amd64 >> hc:num_proc=0 >> hl:mem_free=31.215G >> hl:mem_used=280.996M >> hc:mem_available=1.715G >> 1296 0.54786 semi_direc jj r 04/03/2012 16:43:49 1 >> --------------------------------------------------------------------------------- >> smp...@carl.fft BIP 0/1/4 0.69 lx-amd64 >> hc:num_proc=0 >> hl:mem_free=30.764G >> hl:mem_used=742.805M >> hc:mem_available=1.715G >> 1296 0.54786 semi_direc jj r 04/03/2012 16:43:49 1 >> --------------------------------------------------------------------------------- >> smp...@charlie.fft BIP 0/2/8 0.57 lx-amd64 >> hc:num_proc=0 >> hl:mem_free=62.234G >> hl:mem_used=836.797M >> hc:mem_available=4.018G >> 1296 0.54786 semi_direc jj r 04/03/2012 16:43:49 2 >> ---------------------------------------------------------------------- >> ----------- >> >> Sge reports whatr pls_gridengine_report does, i.e. what was reserved. >> But here is the ouput of the current job (after started by openmpi): >> [charlie:05294] ras:gridengine: JOB_ID: 1296 [charlie:05294] >> ras:gridengine: PE_HOSTFILE: >> /opt/sge/default/spool/charlie/active_jobs/1296.1/pe_hostfile >> [charlie:05294] ras:gridengine: charlie.fft: PE_HOSTFILE shows slots=2 >> [charlie:05294] ras:gridengine: barney.fft: PE_HOSTFILE shows slots=1 >> [charlie:05294] ras:gridengine: carl.fft: PE_HOSTFILE shows slots=1 >> >> ====================== ALLOCATED NODES ====================== >> >> Data for node: Name: charlie Launch id: -1 Arch: ffc91200 State: 2 >> Num boards: 1 Num sockets/board: 2 Num cores/socket: 4 >> Daemon: [[54347,0],0] Daemon launched: True Num slots: 2 Slots in >> use: 0 Num slots allocated: 2 Max slots: 0 Username on node: NULL >> Num procs: 0 Next node_rank: 0 >> Data for node: Name: barney.fft Launch id: -1 Arch: 0 State: 2 >> Num boards: 1 Num sockets/board: 2 Num cores/socket: 4 >> Daemon: Not defined Daemon launched: False Num slots: 1 Slots in >> use: 0 Num slots allocated: 1 Max slots: 0 Username on node: NULL >> Num procs: 0 Next node_rank: 0 >> Data for node: Name: carl.fft Launch id: -1 Arch: 0 State: 2 >> Num boards: 1 Num sockets/board: 2 Num cores/socket: 4 >> Daemon: Not defined Daemon launched: False Num slots: 1 Slots in >> use: 0 Num slots allocated: 1 Max slots: 0 Username on node: NULL >> Num procs: 0 Next node_rank: 0 >> >> ================================================================= >> >> Map generated by mapping policy: 0200 >> Npernode: 0 Oversubscribe allowed: TRUE CPU Lists: FALSE Num new >> daemons: 2 New daemon starting vpid 1 Num nodes: 3 >> >> Data for node: Name: charlie Launch id: -1 Arch: ffc91200 State: 2 >> Num boards: 1 Num sockets/board: 2 Num cores/socket: 4 >> Daemon: [[54347,0],0] Daemon launched: True Num slots: 2 Slots in >> use: 2 Num slots allocated: 2 Max slots: 0 Username on node: NULL >> Num procs: 2 Next node_rank: 2 Data for proc: [[54347,1],0] >> Pid: 0 Local rank: 0 Node rank: 0 >> State: 0 App_context: 0 Slot list: NULL Data for proc: >> [[54347,1],3] >> Pid: 0 Local rank: 1 Node rank: 1 >> State: 0 App_context: 0 Slot list: NULL >> Data for node: Name: barney.fft Launch id: -1 Arch: 0 State: 2 >> Num boards: 1 Num sockets/board: 2 Num cores/socket: 4 >> Daemon: [[54347,0],1] Daemon launched: False Num slots: 1 Slots in >> use: 1 Num slots allocated: 1 Max slots: 0 Username on node: NULL >> Num procs: 1 Next node_rank: 1 Data for proc: [[54347,1],1] >> Pid: 0 Local rank: 0 Node rank: 0 >> State: 0 App_context: 0 Slot list: NULL >> >> Data for node: Name: carl.fft Launch id: -1 Arch: 0 State: 2 >> Num boards: 1 Num sockets/board: 2 Num cores/socket: 4 >> Daemon: [[54347,0],2] Daemon launched: False Num slots: 1 Slots in >> use: 1 Num slots allocated: 1 Max slots: 0 Username on node: NULL >> Num procs: 1 Next node_rank: 1 Data for proc: [[54347,1],2] >> Pid: 0 Local rank: 0 Node rank: 0 >> State: 0 App_context: 0 Slot list: NULL >> >> Regards, >> Eloi >> >> >> >> -----Original Message----- >> From: users-boun...@open-mpi.org [mailto:users-boun...@open-mpi.org] >> On Behalf Of Reuti >> Sent: mardi 3 avril 2012 16:24 >> To: Open MPI Users >> Subject: Re: [OMPI users] sge tight intregration leads to bad >> allocation >> >> Hi, >> >> Am 03.04.2012 um 16:12 schrieb Eloi Gaudry: >> >>> Thanks for your feedback. >>> No, this is the other way around, the "reserved" slots on all nodes are ok >>> but the "used" slots are different. >>> >>> Basically, I'm using SGE to schedule and book resources for a distributed >>> job. When the job is finally launched, it uses a different allocation than >>> the one that was reported by pls_gridengine_info. >>> >>> pls_grid_engine_info report states that 3 nodes were booked: barney (1 >>> slot), carl (1 slot) and charlie (2 slots). This booking was done by sge >>> depending on the memory requirements of the job (among others). >>> >>> When orterun starts the jobs (i.e. when Sge finally start the scheduled >>> job), it uses 3 nodes but the first one (barney: 2 slots instead of 1) is >>> oversubscribed and the last one (carl: 1 slot instead of 2) is underused. >> >> you configured Open MPI to support SGE tight integration and used a PE for >> submitting the job? Can you please post the defintion of the PE. >> >> What was the allocation you saw in SGE's `qstat -g t ` for the job? >> >> -- Reuti >> >> >>> If you need further information, please let me know. >>> >>> Eloi >>> >>> From: users-boun...@open-mpi.org [mailto:users-boun...@open-mpi.org] >>> On Behalf Of Ralph Castain >>> Sent: mardi 3 avril 2012 15:58 >>> To: Open MPI Users >>> Subject: Re: [OMPI users] sge tight intregration leads to bad >>> allocation >>> >>> I'm afraid there isn't enough info here to help. Are you saying you only >>> allocated one slot/node, so the two slots on charlie is in error? >>> >>> Sent from my iPad >>> >>> On Apr 3, 2012, at 6:23 AM, "Eloi Gaudry" <eloi.gau...@fft.be> wrote: >>> >>> Hi, >>> >>> I've observed a strange behavior during rank allocation on a distributed >>> run schedule and submitted using Sge (Son of Grid Egine 8.0.0d) and >>> OpenMPI-1.4.4. >>> Briefly, there is a one-slot difference between allocated rank/slot for Sge >>> and OpenMPI. The issue here is that one node becomes oversubscribed at >>> runtime. >>> >>> Here is the output of the allocation done for gridengine: >>> >>> ====================== ALLOCATED NODES ====================== >>> >>> Data for node: Name: barney Launch id: -1 Arch: >>> ffc91200 State: 2 >>> Num boards: 1 Num sockets/board: 2 Num cores/socket: 2 >>> Daemon: [[22904,0],0] Daemon launched: True >>> Num slots: 1 Slots in use: 0 >>> Num slots allocated: 1 Max slots: 0 >>> Username on node: NULL >>> Num procs: 0 Next node_rank: 0 >>> Data for node: Name: carl.fft Launch id: -1 Arch: 0 >>> State: 2 >>> Num boards: 1 Num sockets/board: 2 Num cores/socket: 2 >>> Daemon: Not defined Daemon launched: False >>> Num slots: 1 Slots in use: 0 >>> Num slots allocated: 1 Max slots: 0 >>> Username on node: NULL >>> Num procs: 0 Next node_rank: 0 >>> Data for node: Name: charlie.fft Launch id: -1 >>> Arch: 0 State: 2 >>> Num boards: 1 Num sockets/board: 2 Num cores/socket: 2 >>> Daemon: Not defined Daemon launched: False >>> Num slots: 2 Slots in use: 0 >>> Num slots allocated: 2 Max slots: 0 >>> Username on node: NULL >>> Num procs: 0 Next node_rank: 0 >>> >>> >>> And here is the allocation finally used: >>> ================================================================= >>> >>> Map generated by mapping policy: 0200 >>> Npernode: 0 Oversubscribe allowed: TRUE CPU Lists: FALSE >>> Num new daemons: 2 New daemon starting vpid 1 >>> Num nodes: 3 >>> >>> Data for node: Name: barney Launch id: -1 Arch: >>> ffc91200 State: 2 >>> Num boards: 1 Num sockets/board: 2 Num cores/socket: 2 >>> Daemon: [[22904,0],0] Daemon launched: True >>> Num slots: 1 Slots in use: 2 >>> Num slots allocated: 1 Max slots: 0 >>> Username on node: NULL >>> Num procs: 2 Next node_rank: 2 >>> Data for proc: [[22904,1],0] >>> Pid: 0 Local rank: 0 Node rank: 0 >>> State: 0 App_context: 0 >>> Slot list: NULL >>> Data for proc: [[22904,1],3] >>> Pid: 0 Local rank: 1 Node rank: 1 >>> State: 0 App_context: 0 >>> Slot list: NULL >>> >>> Data for node: Name: carl.fft Launch id: -1 Arch: 0 >>> State: 2 >>> Num boards: 1 Num sockets/board: 2 Num cores/socket: 2 >>> Daemon: [[22904,0],1] Daemon launched: False >>> Num slots: 1 Slots in use: 1 >>> Num slots allocated: 1 Max slots: 0 >>> Username on node: NULL >>> Num procs: 1 Next node_rank: 1 >>> Data for proc: [[22904,1],1] >>> Pid: 0 Local rank: 0 Node rank: 0 >>> State: 0 App_context: 0 >>> Slot list: NULL >>> >>> Data for node: Name: charlie.fft Launch id: -1 >>> Arch: 0 State: 2 >>> Num boards: 1 Num sockets/board: 2 Num cores/socket: 2 >>> Daemon: [[22904,0],2] Daemon launched: False >>> Num slots: 2 Slots in use: 1 >>> Num slots allocated: 2 Max slots: 0 >>> Username on node: NULL >>> Num procs: 1 Next node_rank: 1 >>> Data for proc: [[22904,1],2] >>> Pid: 0 Local rank: 0 Node rank: 0 >>> State: 0 App_context: 0 >>> Slot list: NULL >>> >>> Has anyone already encounter the same behavior ? >>> Is there a simple fix than not using the tight integration mode between Sge >>> and OpenMPI ? >>> >>> Eloi >>> >>> _______________________________________________ >>> users mailing list >>> us...@open-mpi.org >>> http://www.open-mpi.org/mailman/listinfo.cgi/users >>> _______________________________________________ >>> users mailing list >>> us...@open-mpi.org >>> http://www.open-mpi.org/mailman/listinfo.cgi/users >> >> >> _______________________________________________ >> users mailing list >> us...@open-mpi.org >> http://www.open-mpi.org/mailman/listinfo.cgi/users >> >> >> _______________________________________________ >> users mailing list >> us...@open-mpi.org >> http://www.open-mpi.org/mailman/listinfo.cgi/users > > > _______________________________________________ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users > > > _______________________________________________ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users