Hi, Am 26.07.2011 um 21:19 schrieb Lane, William:
> I can successfully run the MPI testcode via OpenMPI 1.3.3 on less than 87 > slots w/both the btl_tcp_if_exclude and btl_tcp_if_include switches > passed to mpirun. > > SGE always allocates the qsub jobs from the 24 slot nodes first -- up to the > 96 slots that these 4 nodes have available (on the largeMem.q). The rest of > the 602 slots are allocated > from 2 slot nodes (all.q). All requests of up to 96 slots are serviced by the > largeMem.q nodes (which have 24 slots apiece). Anything over 96 slots is > serviced first by the largeMem.q > nodes then by the all.q nodes. did you setup a JSV for it, as PE have no sequence numbers in case a PE is requested? > Here's the PE that I'm using: > > mpich PE (Parallel Environment) queue: > > pe_name mpich > slots 9999 > user_lists NONE > xuser_lists NONE > start_proc_args /opt/gridengine/mpi/startmpi.sh -catch_rsh $pe_hostfile > stop_proc_args /opt/gridengine/mpi/stopmpi.sh These both above can be set to NONE when you compiled Open MPI with SGE integration --with-sge NB: what is defined in rsh_daemon/rsh_command in `qconf -sconf`? > allocation_rule $fill_up Here you specify to fill one machine after the other completely before gathering slots from the next machine. You can change this to $round_robin to get one slot form each node before taking a second from particular machines. If you prefer a fixed allocation, you could also put an integer here. > control_slaves TRUE > job_is_first_task FALSE > urgency_slots min > accounting_summary TRUE > > Wouldn't the -bynode allocation be really inefficient? Does the -bynode > switch imply only one slot is used on each node before it moves on to the > next? Do I get it right: inside the granted slots by SGE you want the allocation inside Open MPI to follow a specific pattern, i.e.: which rank is where? -- Reuti > > Thanks for your help Ralph. At least I have some ideas on where to look now. > > -Bill > ________________________________________ > From: users-boun...@open-mpi.org [users-boun...@open-mpi.org] on behalf of > Ralph Castain [r...@open-mpi.org] > Sent: Tuesday, July 26, 2011 6:32 AM > To: Open MPI Users > Subject: Re: [OMPI users] Can run OpenMPI testcode on 86 or fewer slots in > cluster, but nothing more than that > > A few thoughts: > > * including both btl_tcp_if_include and btl_tcp_if_exclude is problematic as > they are mutually exclusive options. I'm not sure which one will take > precedence. I would suggest only using one of them. > > * the default mapping algorithm is byslot - i.e., OMPI will place procs on > each node of the cluster until all slots on that node have been filled, and > then moves to the next node. Depending on what you have in your machinefile, > it is possible that all 88 procs are being placed on the first node. You > might try spreading your procs across all nodes with -bynode on the cmd line, > or check to ensure that the machinefile is correctly specifying the number of > slots on each node. Note: OMPI will automatically read the SGE environment to > get the host allocation, so the only reason for providing a machinefile is if > you don't want the full allocation used. > > * 88*88 = 7744. MPI transport connections are point-to-point - i.e., each > proc opens a unique connection to another proc. If your procs are all winding > up on the same node, for example, then the system will want at least 7744 > file descriptors on that node, assuming your application does a complete > wireup across all procs. > > Updating to 1.4.3 would be a good idea as it is more stable, but it may not > resolve this problem if the issue is one of the above. > > HTH > Ralph > > > On Jul 25, 2011, at 11:23 PM, Lane, William wrote: > >> Please help me resolve the following problems with a 306 node Rocks cluster >> using SGE. Please note I can run the >> job successfully on <87 slots, but not anymore than that. >> >> We're running SGE and I'm submitting my jobs via the SGE CLI utility qsub >> and the following lines from a script: >> >> mpirun -n $NSLOTS -machinefile $TMPDIR/machines --mca btl_tcp_if_include >> eth0 --mca btl_tcp_if_exclude eth1 --mca oob_tcp_if_exclude eth1 --mca >> opal_set_max_sys_limits 1 --mca pls_gridengine_verbose 1 >> /stf/billstst/ProcessColors2MPICH1 >> echo "MPICH1 mpirun returned #?" >> >> eth1 is the connection to the Isilon NAS, where the object file is located. >> >> The error messages returned are of the form: >> >> WRT ORTE_ERROR_LOG: The system limit on number of pipes a process can open >> was reached >> WRT ORTE_ERROR_LOG: The system limit on number of network connections a >> process can open was reached in file oob_tcp.c at line 447 >> >> We have increased the open file limit to 4096 from 1024, problem still >> exists. >> >> I can run the same test code via MPICH2 successfully on all 696 slots of the >> cluster, but I can't run the >> same code (compiled via OpenMPI version 1.3.3) on any more than 86 slots. >> >> Here's the details on the installed version of Open MPI: >> >> [root}# ./ompi_info >> Package: Open MPI r...@build-x86-64.rocksclusters.org >> Distribution >> Open MPI: 1.3.3 >> Open MPI SVN revision: r21666 >> Open MPI release date: Jul 14, 2009 >> Open RTE: 1.3.3 >> Open RTE SVN revision: r21666 >> Open RTE release date: Jul 14, 2009 >> OPAL: 1.3.3 >> OPAL SVN revision: r21666 >> OPAL release date: Jul 14, 2009 >> Ident string: 1.3.3 >> Prefix: /opt/openmpi >> Configured architecture: x86_64-unknown-linux-gnu >> Configure host: build-x86-64.rocksclusters.org >> Configured by: root >> Configured on: Sat Dec 12 16:29:23 PST 2009 >> Configure host: build-x86-64.rocksclusters.org >> Built by: bruno >> Built on: Sat Dec 12 16:42:52 PST 2009 >> Built host: build-x86-64.rocksclusters.org >> C bindings: yes >> C++ bindings: yes >> Fortran77 bindings: yes (all) >> Fortran90 bindings: yes >> Fortran90 bindings size: small >> C compiler: gcc >> C compiler absolute: /usr/bin/gcc >> C++ compiler: g++ >> C++ compiler absolute: /usr/bin/g++ >> Fortran77 compiler: gfortran >> Fortran77 compiler abs: /usr/bin/gfortran >> Fortran90 compiler: gfortran >> Fortran90 compiler abs: /usr/bin/gfortran >> C profiling: yes >> C++ profiling: yes >> Fortran77 profiling: yes >> Fortran90 profiling: yes >> C++ exceptions: no >> Thread support: posix (mpi: no, progress: no) >> Sparse Groups: no >> Internal debug support: no >> MPI parameter check: runtime >> Memory profiling support: no >> Memory debugging support: no >> libltdl support: yes >> Heterogeneous support: no >> mpirun default --prefix: no >> MPI I/O support: yes >> MPI_WTIME support: gettimeofday >> Symbol visibility support: yes >> FT Checkpoint support: no (checkpoint thread: no) >> MCA backtrace: execinfo (MCA v2.0, API v2.0, Component v1.3.3) >> MCA memory: ptmalloc2 (MCA v2.0, API v2.0, Component v1.3.3) >> MCA paffinity: linux (MCA v2.0, API v2.0, Component v1.3.3) >> MCA carto: auto_detect (MCA v2.0, API v2.0, Component v1.3.3) >> MCA carto: file (MCA v2.0, API v2.0, Component v1.3.3) >> MCA maffinity: first_use (MCA v2.0, API v2.0, Component v1.3.3) >> MCA timer: linux (MCA v2.0, API v2.0, Component v1.3.3) >> MCA installdirs: env (MCA v2.0, API v2.0, Component v1.3.3) >> MCA installdirs: config (MCA v2.0, API v2.0, Component v1.3.3) >> MCA dpm: orte (MCA v2.0, API v2.0, Component v1.3.3) >> MCA pubsub: orte (MCA v2.0, API v2.0, Component v1.3.3) >> MCA allocator: basic (MCA v2.0, API v2.0, Component v1.3.3) >> MCA allocator: bucket (MCA v2.0, API v2.0, Component v1.3.3) >> MCA coll: basic (MCA v2.0, API v2.0, Component v1.3.3) >> MCA coll: hierarch (MCA v2.0, API v2.0, Component v1.3.3) >> MCA coll: inter (MCA v2.0, API v2.0, Component v1.3.3) >> MCA coll: self (MCA v2.0, API v2.0, Component v1.3.3) >> MCA coll: sm (MCA v2.0, API v2.0, Component v1.3.3) >> MCA coll: sync (MCA v2.0, API v2.0, Component v1.3.3) >> MCA coll: tuned (MCA v2.0, API v2.0, Component v1.3.3) >> MCA io: romio (MCA v2.0, API v2.0, Component v1.3.3) >> MCA mpool: fake (MCA v2.0, API v2.0, Component v1.3.3) >> MCA mpool: rdma (MCA v2.0, API v2.0, Component v1.3.3) >> MCA mpool: sm (MCA v2.0, API v2.0, Component v1.3.3) >> MCA pml: cm (MCA v2.0, API v2.0, Component v1.3.3) >> MCA pml: csum (MCA v2.0, API v2.0, Component v1.3.3) >> MCA pml: ob1 (MCA v2.0, API v2.0, Component v1.3.3) >> MCA pml: v (MCA v2.0, API v2.0, Component v1.3.3) >> MCA bml: r2 (MCA v2.0, API v2.0, Component v1.3.3) >> MCA rcache: vma (MCA v2.0, API v2.0, Component v1.3.3) >> MCA btl: self (MCA v2.0, API v2.0, Component v1.3.3) >> MCA btl: sm (MCA v2.0, API v2.0, Component v1.3.3) >> MCA btl: tcp (MCA v2.0, API v2.0, Component v1.3.3) >> MCA topo: unity (MCA v2.0, API v2.0, Component v1.3.3) >> MCA osc: pt2pt (MCA v2.0, API v2.0, Component v1.3.3) >> MCA osc: rdma (MCA v2.0, API v2.0, Component v1.3.3) >> MCA iof: hnp (MCA v2.0, API v2.0, Component v1.3.3) >> MCA iof: orted (MCA v2.0, API v2.0, Component v1.3.3) >> MCA iof: tool (MCA v2.0, API v2.0, Component v1.3.3) >> MCA oob: tcp (MCA v2.0, API v2.0, Component v1.3.3) >> MCA odls: default (MCA v2.0, API v2.0, Component v1.3.3) >> MCA ras: gridengine (MCA v2.0, API v2.0, Component v1.3.3) >> MCA ras: slurm (MCA v2.0, API v2.0, Component v1.3.3) >> MCA rmaps: rank_file (MCA v2.0, API v2.0, Component v1.3.3) >> MCA rmaps: round_robin (MCA v2.0, API v2.0, Component v1.3.3) >> MCA rmaps: seq (MCA v2.0, API v2.0, Component v1.3.3) >> MCA rml: oob (MCA v2.0, API v2.0, Component v1.3.3) >> MCA routed: binomial (MCA v2.0, API v2.0, Component v1.3.3) >> MCA routed: direct (MCA v2.0, API v2.0, Component v1.3.3) >> MCA routed: linear (MCA v2.0, API v2.0, Component v1.3.3) >> MCA plm: rsh (MCA v2.0, API v2.0, Component v1.3.3) >> MCA plm: slurm (MCA v2.0, API v2.0, Component v1.3.3) >> MCA filem: rsh (MCA v2.0, API v2.0, Component v1.3.3) >> MCA errmgr: default (MCA v2.0, API v2.0, Component v1.3.3) >> MCA ess: env (MCA v2.0, API v2.0, Component v1.3.3) >> MCA ess: hnp (MCA v2.0, API v2.0, Component v1.3.3) >> MCA ess: singleton (MCA v2.0, API v2.0, Component v1.3.3) >> MCA ess: slurm (MCA v2.0, API v2.0, Component v1.3.3) >> MCA ess: tool (MCA v2.0, API v2.0, Component v1.3.3) >> MCA grpcomm: bad (MCA v2.0, API v2.0, Component v1.3.3) >> MCA grpcomm: basic (MCA v2.0, API v2.0, Component v1.3.3) >> >> Would upgrading to the latest version of OpenMPI (1.4.3) resolve this issue? >> >> Thank you, >> >> -Bill Lane >> IMPORTANT WARNING: This message is intended for the use of the person or >> entity to which it is addressed and may contain information that is >> privileged and confidential, the disclosure of which is governed by >> applicable law. If the reader of this message is not the intended recipient, >> or the employee or agent responsible for delivering it to the intended >> recipient, you are hereby notified that any dissemination, distribution or >> copying of this information is STRICTLY PROHIBITED. If you have received >> this message in error, please notify us immediately by calling (310) >> 423-6428 and destroy the related message. Thank You for your cooperation. >> IMPORTANT WARNING: This message is intended for the use of the person or >> entity to which it is addressed and may contain information that is >> privileged and confidential, the disclosure of which is governed by >> applicable law. If the reader of this message is not the intended >> recipient, or the employee or agent responsible for delivering it to the >> intended recipient, you are hereby notified that any dissemination, >> distribution or copying of this information is STRICTLY PROHIBITED. >> >> If you have received this message in error, please notify us immediately >> by calling (310) 423-6428 and destroy the related message. Thank You for >> your cooperation. >> >> _______________________________________________ >> users mailing list >> us...@open-mpi.org >> http://www.open-mpi.org/mailman/listinfo.cgi/users > > > _______________________________________________ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users > IMPORTANT WARNING: This message is intended for the use of the person or > entity to which it is addressed and may contain information that is > privileged and confidential, the disclosure of which is governed by > applicable law. If the reader of this message is not the intended recipient, > or the employee or agent responsible for delivering it to the intended > recipient, you are hereby notified that any dissemination, distribution or > copying of this information is STRICTLY PROHIBITED. If you have received this > message in error, please notify us immediately by calling (310) 423-6428 and > destroy the related message. Thank You for your cooperation. > IMPORTANT WARNING: This message is intended for the use of the person or > entity to which it is addressed and may contain information that is > privileged and confidential, the disclosure of which is governed by > applicable law. If the reader of this message is not the intended recipient, > or the employee or agent responsible for delivering it to the intended > recipient, you are hereby notified that any dissemination, distribution or > copying of this information is STRICTLY PROHIBITED. > > If you have received this message in error, please notify us immediately > by calling (310) 423-6428 and destroy the related message. Thank You for > your cooperation. > > _______________________________________________ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users