Hi Reuti, Yes, the openmpi binaries used were build after having used the --with-sge during configure, and we only use those binaries on our cluster.
[eg@moe:~]$ /opt/openmpi-1.3.3/bin/ompi_info Package: Open MPI root@moe Distribution Open MPI: 1.3.3 Open MPI SVN revision: r21666 Open MPI release date: Jul 14, 2009 Open RTE: 1.3.3 Open RTE SVN revision: r21666 Open RTE release date: Jul 14, 2009 OPAL: 1.3.3 OPAL SVN revision: r21666 OPAL release date: Jul 14, 2009 Ident string: 1.3.3 Prefix: /opt/openmpi-1.3.3 Configured architecture: x86_64-unknown-linux-gnu Configure host: moe Configured by: root Configured on: Tue Nov 10 11:19:34 CET 2009 Configure host: moe Built by: root Built on: Tue Nov 10 11:28:14 CET 2009 Built host: moe C bindings: yes C++ bindings: yes Fortran77 bindings: yes (all) Fortran90 bindings: yes Fortran90 bindings size: small C compiler: gcc C compiler absolute: /usr/bin/gcc C++ compiler: g++ C++ compiler absolute: /usr/bin/g++ Fortran77 compiler: gfortran Fortran77 compiler abs: /usr/bin/gfortran Fortran90 compiler: gfortran Fortran90 compiler abs: /usr/bin/gfortran C profiling: yes C++ profiling: yes Fortran77 profiling: yes Fortran90 profiling: yes C++ exceptions: yes Thread support: posix (mpi: no, progress: no) Sparse Groups: no Internal debug support: no MPI parameter check: runtime Memory profiling support: no Memory debugging support: no libltdl support: yes Heterogeneous support: no mpirun default --prefix: no MPI I/O support: yes MPI_WTIME support: gettimeofday Symbol visibility support: yes FT Checkpoint support: no (checkpoint thread: no) MCA backtrace: execinfo (MCA v2.0, API v2.0, Component v1.3.3) MCA memory: ptmalloc2 (MCA v2.0, API v2.0, Component v1.3.3) MCA paffinity: linux (MCA v2.0, API v2.0, Component v1.3.3) MCA carto: auto_detect (MCA v2.0, API v2.0, Component v1.3.3) MCA carto: file (MCA v2.0, API v2.0, Component v1.3.3) MCA maffinity: first_use (MCA v2.0, API v2.0, Component v1.3.3) MCA timer: linux (MCA v2.0, API v2.0, Component v1.3.3) MCA installdirs: env (MCA v2.0, API v2.0, Component v1.3.3) MCA installdirs: config (MCA v2.0, API v2.0, Component v1.3.3) MCA dpm: orte (MCA v2.0, API v2.0, Component v1.3.3) MCA pubsub: orte (MCA v2.0, API v2.0, Component v1.3.3) MCA allocator: basic (MCA v2.0, API v2.0, Component v1.3.3) MCA allocator: bucket (MCA v2.0, API v2.0, Component v1.3.3) MCA coll: basic (MCA v2.0, API v2.0, Component v1.3.3) MCA coll: hierarch (MCA v2.0, API v2.0, Component v1.3.3) MCA coll: inter (MCA v2.0, API v2.0, Component v1.3.3) MCA coll: self (MCA v2.0, API v2.0, Component v1.3.3) MCA coll: sm (MCA v2.0, API v2.0, Component v1.3.3) MCA coll: sync (MCA v2.0, API v2.0, Component v1.3.3) MCA coll: tuned (MCA v2.0, API v2.0, Component v1.3.3) MCA io: romio (MCA v2.0, API v2.0, Component v1.3.3) MCA mpool: fake (MCA v2.0, API v2.0, Component v1.3.3) MCA mpool: rdma (MCA v2.0, API v2.0, Component v1.3.3) MCA mpool: sm (MCA v2.0, API v2.0, Component v1.3.3) MCA pml: cm (MCA v2.0, API v2.0, Component v1.3.3) MCA pml: csum (MCA v2.0, API v2.0, Component v1.3.3) MCA pml: ob1 (MCA v2.0, API v2.0, Component v1.3.3) MCA pml: v (MCA v2.0, API v2.0, Component v1.3.3) MCA bml: r2 (MCA v2.0, API v2.0, Component v1.3.3) MCA rcache: vma (MCA v2.0, API v2.0, Component v1.3.3) MCA btl: gm (MCA v2.0, API v2.0, Component v1.3.3) MCA btl: self (MCA v2.0, API v2.0, Component v1.3.3) MCA btl: sm (MCA v2.0, API v2.0, Component v1.3.3) MCA btl: tcp (MCA v2.0, API v2.0, Component v1.3.3) MCA topo: unity (MCA v2.0, API v2.0, Component v1.3.3) MCA osc: pt2pt (MCA v2.0, API v2.0, Component v1.3.3) MCA osc: rdma (MCA v2.0, API v2.0, Component v1.3.3) MCA iof: hnp (MCA v2.0, API v2.0, Component v1.3.3) MCA iof: orted (MCA v2.0, API v2.0, Component v1.3.3) MCA iof: tool (MCA v2.0, API v2.0, Component v1.3.3) MCA oob: tcp (MCA v2.0, API v2.0, Component v1.3.3) MCA odls: default (MCA v2.0, API v2.0, Component v1.3.3) MCA ras: gridengine (MCA v2.0, API v2.0, Component v1.3.3) MCA ras: slurm (MCA v2.0, API v2.0, Component v1.3.3) MCA rmaps: rank_file (MCA v2.0, API v2.0, Component v1.3.3) MCA rmaps: round_robin (MCA v2.0, API v2.0, Component v1.3.3) MCA rmaps: seq (MCA v2.0, API v2.0, Component v1.3.3) MCA rml: oob (MCA v2.0, API v2.0, Component v1.3.3) MCA routed: binomial (MCA v2.0, API v2.0, Component v1.3.3) MCA routed: direct (MCA v2.0, API v2.0, Component v1.3.3) MCA routed: linear (MCA v2.0, API v2.0, Component v1.3.3) MCA plm: rsh (MCA v2.0, API v2.0, Component v1.3.3) MCA plm: slurm (MCA v2.0, API v2.0, Component v1.3.3) MCA filem: rsh (MCA v2.0, API v2.0, Component v1.3.3) MCA errmgr: default (MCA v2.0, API v2.0, Component v1.3.3) MCA ess: env (MCA v2.0, API v2.0, Component v1.3.3) MCA ess: hnp (MCA v2.0, API v2.0, Component v1.3.3) MCA ess: singleton (MCA v2.0, API v2.0, Component v1.3.3) MCA ess: slurm (MCA v2.0, API v2.0, Component v1.3.3) MCA ess: tool (MCA v2.0, API v2.0, Component v1.3.3) MCA grpcomm: bad (MCA v2.0, API v2.0, Component v1.3.3) MCA grpcomm: basic (MCA v2.0, API v2.0, Component v1.3.3) Regards, Eloi On Friday 21 May 2010 16:01:54 Reuti wrote: > Hi, > > Am 21.05.2010 um 14:11 schrieb Eloi Gaudry: > > Hi there, > > > > I'm observing something strange on our cluster managed by SGE6.2u4 when > > launching a parallel computation on several nodes, using OpenMPI/SGE > > tight- integration mode (OpenMPI-1.3.3). It seems that the SGE allocated > > slots are not used by OpenMPI, as if OpenMPI was doing is own > > round-robin allocation based on the allocated node hostnames. > > you compiled Open MPI with --with-sge (and recompiled your applications)? > You are using the correct mpiexec? > > -- Reuti > > > Here is what I'm doing: > > - launch a parallel computation involving 8 processors, using for each of > > them 14GB of memory. I'm using a qsub command where i request > > memory_free resource and use tight integration with openmpi > > - 3 servers are available: > > . barney with 4 cores (4 slots) and 32GB > > . carl with 4 cores (4 slots) and 32GB > > . charlie with 8 cores (8 slots) and 64GB > > > > Here is the output of the allocated nodes (OpenMPI output): > > ====================== ALLOCATED NODES ====================== > > > > Data for node: Name: charlie Launch id: -1 Arch: ffc91200 State: 2 > > > > Daemon: [[44332,0],0] Daemon launched: True > > Num slots: 4 Slots in use: 0 > > Num slots allocated: 4 Max slots: 0 > > Username on node: NULL > > Num procs: 0 Next node_rank: 0 > > > > Data for node: Name: carl.fft Launch id: -1 Arch: 0 State: 2 > > > > Daemon: Not defined Daemon launched: False > > Num slots: 2 Slots in use: 0 > > Num slots allocated: 2 Max slots: 0 > > Username on node: NULL > > Num procs: 0 Next node_rank: 0 > > > > Data for node: Name: barney.fft Launch id: -1 Arch: 0 State: 2 > > > > Daemon: Not defined Daemon launched: False > > Num slots: 2 Slots in use: 0 > > Num slots allocated: 2 Max slots: 0 > > Username on node: NULL > > Num procs: 0 Next node_rank: 0 > > > > ================================================================= > > > > Here is what I see when my computation is running on the cluster: > > # rank pid hostname > > > > 0 28112 charlie > > 1 11417 carl > > 2 11808 barney > > 3 28113 charlie > > 4 11418 carl > > 5 11809 barney > > 6 28114 charlie > > 7 11419 carl > > > > Note that -the parallel environment used under SGE is defined as: > > [eg@moe:~]$ qconf -sp round_robin > > pe_name round_robin > > slots 32 > > user_lists NONE > > xuser_lists NONE > > start_proc_args /bin/true > > stop_proc_args /bin/true > > allocation_rule $round_robin > > control_slaves TRUE > > job_is_first_task FALSE > > urgency_slots min > > accounting_summary FALSE > > > > I'm wondering why OpenMPI didn't use the allocated nodes chosen by SGE > > (cf. "ALLOCATED NODES" report) but instead allocate each job of the > > parallel computation at a time, using a round-robin method. > > > > Note that I'm using the '--bynode' option in the orterun command line. If > > the behavior I'm observing is simply the consequence of using this > > option, please let me know. This would eventually mean that one need to > > state that SGE tight- integration has a lower priority on orterun > > behavior than the different command line options. > > > > Any help would be appreciated, > > Thanks, > > Eloi > > _______________________________________________ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users -- Eloi Gaudry Free Field Technologies Axis Park Louvain-la-Neuve Rue Emile Francqui, 1 B-1435 Mont-Saint Guibert BELGIUM Company Phone: +32 10 487 959 Company Fax: +32 10 454 626