The suggestion will probably work, but it is not a solution.
"choosing barrier synchronization" is not recommended by SKaMPI team and that it reduces accuracy of the benchmark. The problem is either at pml ob1 level or in btl ib level - and it has to do with many messages being sent at the same time. You can reproduce this type of problem at 4 - 5 nodes over IB (on odin) using bcast or reduce using small segment sizes (1KB, less than eager size for ib). (I do not think I saw it on 2 nodes). I haven't tried it on onesided operations, but if it happens there too - I am even more likely to believe in my theory :)

Thanks,
Jelena

Gleb Natapov wrote:
On Wed, Sep 19, 2007 at 01:58:35PM -0600, Edmund Sumbar wrote:
I'm trying to run skampi-5.0.1-r0191 under PBS
over IB with the command line

   mpirun -np 2 ./skampi -i coll.ski -o coll_ib.sko
Can you add choose_barrier_synchronization()
to coll.ski and try again? It looks like this one:
https://svn.open-mpi.org/trac/ompi/ticket/1015

The pt2pt and mmisc tests run to completion.
The coll and onesided tests, on the other hand,
start to produce output but then seem to hang.
Actually, the cpus appear to be busy doing
something (I don't know what), but output stops.
The tests should only last the order of minutes
but I end up deleting the job after about 15 min.

All test run to completion with --mca btl tcp,self

Any suggestions as to how to diagnose this problem?
Are there any known issues with OpenMPI/IB and the
SKaMPI benchmark?

(BTW, skampi works with mvapich2)

System details follow...

--
Ed[mund [Sumbar]]
AICT Research Support, Univ of Alberta


$ uname -a
Linux opteron-cluster.nic.ualberta.ca 2.6.21-smp #1 SMP Tue Aug 7 12:45:20 MDT 
2007 x86_64 x86_64 x86_64 GNU/Linux

$ ./configure --prefix=/usr/local/openmpi-1.2.3 --with-tm=/opt/torque 
--with-openib=/usr/lib --with-libnuma=/usr/lib64

$ ompi_info
                 Open MPI: 1.2.3
    Open MPI SVN revision: r15136
                 Open RTE: 1.2.3
    Open RTE SVN revision: r15136
                     OPAL: 1.2.3
        OPAL SVN revision: r15136
                   Prefix: /usr/local/openmpi-1.2.3
  Configured architecture: x86_64-unknown-linux-gnu
            Configured by: esumbar
            Configured on: Mon Sep 17 10:00:35 MDT 2007
           Configure host: opteron-cluster.nic.ualberta.ca
                 Built by: esumbar
                 Built on: Mon Sep 17 10:05:09 MDT 2007
               Built host: opteron-cluster.nic.ualberta.ca
               C bindings: yes
             C++ bindings: yes
       Fortran77 bindings: yes (all)
       Fortran90 bindings: yes
  Fortran90 bindings size: small
               C compiler: gcc
      C compiler absolute: /usr/bin/gcc
             C++ compiler: g++
    C++ compiler absolute: /usr/bin/g++
       Fortran77 compiler: gfortran
   Fortran77 compiler abs: /usr/bin/gfortran
       Fortran90 compiler: gfortran
   Fortran90 compiler abs: /usr/bin/gfortran
              C profiling: yes
            C++ profiling: yes
      Fortran77 profiling: yes
      Fortran90 profiling: yes
           C++ exceptions: no
           Thread support: posix (mpi: no, progress: no)
   Internal debug support: no
      MPI parameter check: runtime
Memory profiling support: no
Memory debugging support: no
          libltdl support: yes
    Heterogeneous support: yes
  mpirun default --prefix: no
            MCA backtrace: execinfo (MCA v1.0, API v1.0, Component v1.2.3)
               MCA memory: ptmalloc2 (MCA v1.0, API v1.0, Component v1.2.3)
            MCA paffinity: linux (MCA v1.0, API v1.0, Component v1.2.3)
            MCA maffinity: first_use (MCA v1.0, API v1.0, Component v1.2.3)
            MCA maffinity: libnuma (MCA v1.0, API v1.0, Component v1.2.3)
                MCA timer: linux (MCA v1.0, API v1.0, Component v1.2.3)
          MCA installdirs: env (MCA v1.0, API v1.0, Component v1.2.3)
          MCA installdirs: config (MCA v1.0, API v1.0, Component v1.2.3)
            MCA allocator: basic (MCA v1.0, API v1.0, Component v1.0)
            MCA allocator: bucket (MCA v1.0, API v1.0, Component v1.0)
                 MCA coll: basic (MCA v1.0, API v1.0, Component v1.2.3)
                 MCA coll: self (MCA v1.0, API v1.0, Component v1.2.3)
                 MCA coll: sm (MCA v1.0, API v1.0, Component v1.2.3)
                 MCA coll: tuned (MCA v1.0, API v1.0, Component v1.2.3)
                   MCA io: romio (MCA v1.0, API v1.0, Component v1.2.3)
                MCA mpool: rdma (MCA v1.0, API v1.0, Component v1.2.3)
                MCA mpool: sm (MCA v1.0, API v1.0, Component v1.2.3)
                  MCA pml: cm (MCA v1.0, API v1.0, Component v1.2.3)
                  MCA pml: ob1 (MCA v1.0, API v1.0, Component v1.2.3)
                  MCA bml: r2 (MCA v1.0, API v1.0, Component v1.2.3)
               MCA rcache: vma (MCA v1.0, API v1.0, Component v1.2.3)
                  MCA btl: openib (MCA v1.0, API v1.0.1, Component v1.2.3)
                  MCA btl: self (MCA v1.0, API v1.0.1, Component v1.2.3)
                  MCA btl: sm (MCA v1.0, API v1.0.1, Component v1.2.3)
                  MCA btl: tcp (MCA v1.0, API v1.0.1, Component v1.0)
                 MCA topo: unity (MCA v1.0, API v1.0, Component v1.2.3)
                  MCA osc: pt2pt (MCA v1.0, API v1.0, Component v1.2.3)
               MCA errmgr: hnp (MCA v1.0, API v1.3, Component v1.2.3)
               MCA errmgr: orted (MCA v1.0, API v1.3, Component v1.2.3)
               MCA errmgr: proxy (MCA v1.0, API v1.3, Component v1.2.3)
                  MCA gpr: null (MCA v1.0, API v1.0, Component v1.2.3)
                  MCA gpr: proxy (MCA v1.0, API v1.0, Component v1.2.3)
                  MCA gpr: replica (MCA v1.0, API v1.0, Component v1.2.3)
                  MCA iof: proxy (MCA v1.0, API v1.0, Component v1.2.3)
                  MCA iof: svc (MCA v1.0, API v1.0, Component v1.2.3)
                   MCA ns: proxy (MCA v1.0, API v2.0, Component v1.2.3)
                   MCA ns: replica (MCA v1.0, API v2.0, Component v1.2.3)
                  MCA oob: tcp (MCA v1.0, API v1.0, Component v1.0)
                  MCA ras: dash_host (MCA v1.0, API v1.3, Component v1.2.3)
                  MCA ras: gridengine (MCA v1.0, API v1.3, Component v1.2.3)
                  MCA ras: localhost (MCA v1.0, API v1.3, Component v1.2.3)
                  MCA ras: slurm (MCA v1.0, API v1.3, Component v1.2.3)
                  MCA ras: tm (MCA v1.0, API v1.3, Component v1.2.3)
                  MCA rds: hostfile (MCA v1.0, API v1.3, Component v1.2.3)
                  MCA rds: proxy (MCA v1.0, API v1.3, Component v1.2.3)
                  MCA rds: resfile (MCA v1.0, API v1.3, Component v1.2.3)
                MCA rmaps: round_robin (MCA v1.0, API v1.3, Component v1.2.3)
                 MCA rmgr: proxy (MCA v1.0, API v2.0, Component v1.2.3)
                 MCA rmgr: urm (MCA v1.0, API v2.0, Component v1.2.3)
                  MCA rml: oob (MCA v1.0, API v1.0, Component v1.2.3)
                  MCA pls: gridengine (MCA v1.0, API v1.3, Component v1.2.3)
                  MCA pls: proxy (MCA v1.0, API v1.3, Component v1.2.3)
                  MCA pls: rsh (MCA v1.0, API v1.3, Component v1.2.3)
                  MCA pls: slurm (MCA v1.0, API v1.3, Component v1.2.3)
                  MCA pls: tm (MCA v1.0, API v1.3, Component v1.2.3)
                  MCA sds: env (MCA v1.0, API v1.0, Component v1.2.3)
                  MCA sds: pipe (MCA v1.0, API v1.0, Component v1.2.3)
                  MCA sds: seed (MCA v1.0, API v1.0, Component v1.2.3)
                  MCA sds: singleton (MCA v1.0, API v1.0, Component v1.2.3)
                  MCA sds: slurm (MCA v1.0, API v1.0, Component v1.2.3)
_______________________________________________
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users

--
                        Gleb.
_______________________________________________
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users

<<attachment: pjesa.vcf>>

Reply via email to