Another thing to try is a change that we made late in the Open MPI v1.2 series with regards to IB:

    http://www.open-mpi.org/faq/?category=openfabrics#v1.2-use-early-completion



On Dec 24, 2008, at 10:07 PM, Tim Mattox wrote:

For your runs with Open MPI over InfiniBand, try using openib,sm,self
for the BTL setting, so that shared memory communications are used
within a node.  It would give us another datapoint to help diagnose
the problem.  As for other things we would need to help diagnose the
problem, please follow the advice on this FAQ entry, and the help page:
http://www.open-mpi.org/faq/?category=openfabrics#ofa-troubleshoot
http://www.open-mpi.org/community/help/

On Wed, Dec 24, 2008 at 5:55 AM, Biagio Lucini <b.luc...@swansea.ac.uk> wrote:
Pavel Shamis (Pasha) wrote:

Biagio Lucini wrote:

Hello,

I am new to this list, where I hope to find a solution for a problem
that I have been having for quite a longtime.

I run various versions of openmpi (from 1.1.2 to 1.2.8) on a cluster
with Infiniband interconnects that I use and administer at the same
time. The openfabric stac is OFED-1.2.5, the compilers gcc 4.2 and
Intel. The queue manager is SGE 6.0u8.

Do you use OpenMPI version that is included in OFED ? Did you was able
to run basic OFED/OMPI tests/benchmarks between two nodes ?


Hi,

yes to both questions: the OMPI version is the one that comes with OFED (1.1.2-1) and the basic tests run fine. For instance, IMB-MPI1 (which is
more than basic, as far as I can see) reports for the last test:

#---------------------------------------------------
# Benchmarking Barrier
# #processes = 6
#---------------------------------------------------
#repetitions  t_min[usec]  t_max[usec]  t_avg[usec]
       1000        22.93        22.95        22.94


for the openib,self btl (6 processes, all processes on different nodes)
and

#---------------------------------------------------
# Benchmarking Barrier
# #processes = 6
#---------------------------------------------------
#repetitions  t_min[usec]  t_max[usec]  t_avg[usec]
       1000       191.30       191.42       191.34

for the tcp,self btl (same test)

No anomalies for other tests (ping-pong, all-to-all etc.)

Thanks,
Biagio


--
=========================================================

Dr. Biagio Lucini
Department of Physics, Swansea University
Singleton Park, SA2 8PP Swansea (UK)
Tel. +44 (0)1792 602284

=========================================================
_______________________________________________
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users




--
Tim Mattox, Ph.D. - http://homepage.mac.com/tmattox/
tmat...@gmail.com || timat...@open-mpi.org
   I'm a bright... http://www.the-brights.net/
_______________________________________________
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users


--
Jeff Squyres
Cisco Systems

Reply via email to