Re: [OMPI users] [sge::tight-integration] slot scheduling and resources handling

Eloi Gaudry Mon, 7 Jun 2010 03:50:49 -0400

Hi Reuti,

I've been unable to reproduce the issue so far.


Sorry for the convenience,
Eloi

On Tuesday 25 May 2010 11:32:44 Reuti wrote:
> Hi,
> 
> Am 25.05.2010 um 09:14 schrieb Eloi Gaudry:
> > I do no reset any environment variable during job submission or job
> > handling. Is there a simple way to check that openmpi is working as
> > expected with SGE tight integration (as displaying environment
> > variables, setting options on the command line, etc. ) ?
> 
> a) put a command:
> 
> env
> 
> in the jobscript and check the output for $JOB_ID and various $SGE_*
> variables.
> 
> b) to confirm the misbehavior: are the tasks on the slave nodes kids of
> sge_shepherd or any system sshd/rshd?
> 
> -- Reuti
> 
> > Regards,
> > Eloi
> > 
> > On Friday 21 May 2010 17:35:24 Reuti wrote:
> >> Hi,
> >> 
> >> Am 21.05.2010 um 17:19 schrieb Eloi Gaudry:
> >>> Hi Reuti,
> >>> 
> >>> Yes, the openmpi binaries used were build after having used the
> >>> --with-sge during configure, and we only use those binaries on our
> >>> cluster.
> >>> 
> >>> [eg@moe:~]$ /opt/openmpi-1.3.3/bin/ompi_info
> >>> 
> >>>                MCA ras: gridengine (MCA v2.0, API v2.0, Component
> >>>                v1.3.3)
> >> 
> >> ok. As you have a Tight Integration as goal and set in your PE
> >> "control_slaves TRUE", SGE wouldn't allow `qrsh -inherit ...` to nodes
> >> which are not in the list of granted nodes. So it looks, like your job
> >> is running outside of this Tight Integration with its own `rsh`or
> >> `ssh`.
> >> 
> >> Do you reset $JOB_ID or other environment variables in your jobscript,
> >> which could trigger Open MPI to assume that it's not running inside SGE?
> >> 
> >> -- Reuti
> >> 
> >>> On Friday 21 May 2010 16:01:54 Reuti wrote:
> >>>> Hi,
> >>>> 
> >>>> Am 21.05.2010 um 14:11 schrieb Eloi Gaudry:
> >>>>> Hi there,
> >>>>> 
> >>>>> I'm observing something strange on our cluster managed by SGE6.2u4
> >>>>> when launching a parallel computation on several nodes, using
> >>>>> OpenMPI/SGE tight- integration mode (OpenMPI-1.3.3). It seems that
> >>>>> the SGE allocated slots are not used by OpenMPI, as if OpenMPI was
> >>>>> doing is own
> >>>>> round-robin allocation based on the allocated node hostnames.
> >>>> 
> >>>> you compiled Open MPI with --with-sge (and recompiled your
> >>>> applications)? You are using the correct mpiexec?
> >>>> 
> >>>> -- Reuti
> >>>> 
> >>>>> Here is what I'm doing:
> >>>>> - launch a parallel computation involving 8 processors, using for
> >>>>> each of them 14GB of memory. I'm using a qsub command where i
> >>>>> request memory_free resource and use tight integration with openmpi
> >>>>> - 3 servers are available:
> >>>>> . barney with 4 cores (4 slots) and 32GB
> >>>>> . carl with 4 cores (4 slots) and 32GB
> >>>>> . charlie with 8 cores (8 slots) and 64GB
> >>>>> 
> >>>>> Here is the output of the allocated nodes (OpenMPI output):
> >>>>> ======================   ALLOCATED NODES   ======================
> >>>>> 
> >>>>> Data for node: Name: charlie   Launch id: -1 Arch: ffc91200  State: 2
> >>>>> 
> >>>>> Daemon: [[44332,0],0] Daemon launched: True
> >>>>> Num slots: 4  Slots in use: 0
> >>>>> Num slots allocated: 4  Max slots: 0
> >>>>> Username on node: NULL
> >>>>> Num procs: 0  Next node_rank: 0
> >>>>> 
> >>>>> Data for node: Name: carl.fft    Launch id: -1 Arch: 0 State: 2
> >>>>> 
> >>>>> Daemon: Not defined Daemon launched: False
> >>>>> Num slots: 2  Slots in use: 0
> >>>>> Num slots allocated: 2  Max slots: 0
> >>>>> Username on node: NULL
> >>>>> Num procs: 0  Next node_rank: 0
> >>>>> 
> >>>>> Data for node: Name: barney.fft    Launch id: -1 Arch: 0 State: 2
> >>>>> 
> >>>>> Daemon: Not defined Daemon launched: False
> >>>>> Num slots: 2  Slots in use: 0
> >>>>> Num slots allocated: 2  Max slots: 0
> >>>>> Username on node: NULL
> >>>>> Num procs: 0  Next node_rank: 0
> >>>>> 
> >>>>> =================================================================
> >>>>> 
> >>>>> Here is what I see when my computation is running on the cluster:
> >>>>> #     rank       pid          hostname
> >>>>> 
> >>>>>       0     28112          charlie
> >>>>>       1     11417          carl
> >>>>>       2     11808          barney
> >>>>>       3     28113          charlie
> >>>>>       4     11418          carl
> >>>>>       5     11809          barney
> >>>>>       6     28114          charlie
> >>>>>       7     11419          carl
> >>>>> 
> >>>>> Note that -the parallel environment used under SGE is defined as:
> >>>>> [eg@moe:~]$ qconf -sp round_robin
> >>>>> pe_name            round_robin
> >>>>> slots              32
> >>>>> user_lists         NONE
> >>>>> xuser_lists        NONE
> >>>>> start_proc_args    /bin/true
> >>>>> stop_proc_args     /bin/true
> >>>>> allocation_rule    $round_robin
> >>>>> control_slaves     TRUE
> >>>>> job_is_first_task  FALSE
> >>>>> urgency_slots      min
> >>>>> accounting_summary FALSE
> >>>>> 
> >>>>> I'm wondering why OpenMPI didn't use the allocated nodes chosen by
> >>>>> SGE (cf. "ALLOCATED NODES" report) but instead allocate each job of
> >>>>> the parallel computation at a time, using a round-robin method.
> >>>>> 
> >>>>> Note that I'm using the '--bynode' option in the orterun command
> >>>>> line. If the behavior I'm observing is simply the consequence of
> >>>>> using this option, please let me know. This would eventually mean
> >>>>> that one need to state that SGE tight- integration has a lower
> >>>>> priority on orterun behavior than the different command line
> >>>>> options.
> >>>>> 
> >>>>> Any help would be appreciated,
> >>>>> Thanks,
> >>>>> Eloi
> >>>> 
> >>>> _______________________________________________
> >>>> users mailing list
> >>>> us...@open-mpi.org
> >>>> http://www.open-mpi.org/mailman/listinfo.cgi/users

-- 


Eloi Gaudry

Free Field Technologies
Axis Park Louvain-la-Neuve
Rue Emile Francqui, 1
B-1435 Mont-Saint Guibert
BELGIUM

Company Phone: +32 10 487 959
Company Fax:   +32 10 454 626

Re: [OMPI users] [sge::tight-integration] slot scheduling and resources handling

Reply via email to