The suggestion will probably work, but it is not a solution."choosing barrier synchronization" is not recommended by SKaMPI team and that it reduces accuracy of the benchmark. The problem is either at pml ob1 level or in btl ib level - and it has to do with many messages being sent at the same time. You can reproduce this type of problem at 4 - 5 nodes over IB (on odin) using bcast or reduce using small segment sizes (1KB, less than eager size for ib). (I do not think I saw it on 2 nodes). I haven't tried it on onesided operations, but if it happens there too - I am even more likely to believe in my theory :)
Thanks, Jelena Gleb Natapov wrote:
On Wed, Sep 19, 2007 at 01:58:35PM -0600, Edmund Sumbar wrote:Can you add choose_barrier_synchronization()I'm trying to run skampi-5.0.1-r0191 under PBS over IB with the command line mpirun -np 2 ./skampi -i coll.ski -o coll_ib.skoto coll.ski and try again? It looks like this one: https://svn.open-mpi.org/trac/ompi/ticket/1015The pt2pt and mmisc tests run to completion. The coll and onesided tests, on the other hand, start to produce output but then seem to hang. Actually, the cpus appear to be busy doing something (I don't know what), but output stops. The tests should only last the order of minutes but I end up deleting the job after about 15 min. All test run to completion with --mca btl tcp,self Any suggestions as to how to diagnose this problem? Are there any known issues with OpenMPI/IB and the SKaMPI benchmark? (BTW, skampi works with mvapich2) System details follow... -- Ed[mund [Sumbar]] AICT Research Support, Univ of Alberta $ uname -a Linux opteron-cluster.nic.ualberta.ca 2.6.21-smp #1 SMP Tue Aug 7 12:45:20 MDT 2007 x86_64 x86_64 x86_64 GNU/Linux $ ./configure --prefix=/usr/local/openmpi-1.2.3 --with-tm=/opt/torque --with-openib=/usr/lib --with-libnuma=/usr/lib64 $ ompi_info Open MPI: 1.2.3 Open MPI SVN revision: r15136 Open RTE: 1.2.3 Open RTE SVN revision: r15136 OPAL: 1.2.3 OPAL SVN revision: r15136 Prefix: /usr/local/openmpi-1.2.3 Configured architecture: x86_64-unknown-linux-gnu Configured by: esumbar Configured on: Mon Sep 17 10:00:35 MDT 2007 Configure host: opteron-cluster.nic.ualberta.ca Built by: esumbar Built on: Mon Sep 17 10:05:09 MDT 2007 Built host: opteron-cluster.nic.ualberta.ca C bindings: yes C++ bindings: yes Fortran77 bindings: yes (all) Fortran90 bindings: yes Fortran90 bindings size: small C compiler: gcc C compiler absolute: /usr/bin/gcc C++ compiler: g++ C++ compiler absolute: /usr/bin/g++ Fortran77 compiler: gfortran Fortran77 compiler abs: /usr/bin/gfortran Fortran90 compiler: gfortran Fortran90 compiler abs: /usr/bin/gfortran C profiling: yes C++ profiling: yes Fortran77 profiling: yes Fortran90 profiling: yes C++ exceptions: no Thread support: posix (mpi: no, progress: no) Internal debug support: no MPI parameter check: runtime Memory profiling support: no Memory debugging support: no libltdl support: yes Heterogeneous support: yes mpirun default --prefix: no MCA backtrace: execinfo (MCA v1.0, API v1.0, Component v1.2.3) MCA memory: ptmalloc2 (MCA v1.0, API v1.0, Component v1.2.3) MCA paffinity: linux (MCA v1.0, API v1.0, Component v1.2.3) MCA maffinity: first_use (MCA v1.0, API v1.0, Component v1.2.3) MCA maffinity: libnuma (MCA v1.0, API v1.0, Component v1.2.3) MCA timer: linux (MCA v1.0, API v1.0, Component v1.2.3) MCA installdirs: env (MCA v1.0, API v1.0, Component v1.2.3) MCA installdirs: config (MCA v1.0, API v1.0, Component v1.2.3) MCA allocator: basic (MCA v1.0, API v1.0, Component v1.0) MCA allocator: bucket (MCA v1.0, API v1.0, Component v1.0) MCA coll: basic (MCA v1.0, API v1.0, Component v1.2.3) MCA coll: self (MCA v1.0, API v1.0, Component v1.2.3) MCA coll: sm (MCA v1.0, API v1.0, Component v1.2.3) MCA coll: tuned (MCA v1.0, API v1.0, Component v1.2.3) MCA io: romio (MCA v1.0, API v1.0, Component v1.2.3) MCA mpool: rdma (MCA v1.0, API v1.0, Component v1.2.3) MCA mpool: sm (MCA v1.0, API v1.0, Component v1.2.3) MCA pml: cm (MCA v1.0, API v1.0, Component v1.2.3) MCA pml: ob1 (MCA v1.0, API v1.0, Component v1.2.3) MCA bml: r2 (MCA v1.0, API v1.0, Component v1.2.3) MCA rcache: vma (MCA v1.0, API v1.0, Component v1.2.3) MCA btl: openib (MCA v1.0, API v1.0.1, Component v1.2.3) MCA btl: self (MCA v1.0, API v1.0.1, Component v1.2.3) MCA btl: sm (MCA v1.0, API v1.0.1, Component v1.2.3) MCA btl: tcp (MCA v1.0, API v1.0.1, Component v1.0) MCA topo: unity (MCA v1.0, API v1.0, Component v1.2.3) MCA osc: pt2pt (MCA v1.0, API v1.0, Component v1.2.3) MCA errmgr: hnp (MCA v1.0, API v1.3, Component v1.2.3) MCA errmgr: orted (MCA v1.0, API v1.3, Component v1.2.3) MCA errmgr: proxy (MCA v1.0, API v1.3, Component v1.2.3) MCA gpr: null (MCA v1.0, API v1.0, Component v1.2.3) MCA gpr: proxy (MCA v1.0, API v1.0, Component v1.2.3) MCA gpr: replica (MCA v1.0, API v1.0, Component v1.2.3) MCA iof: proxy (MCA v1.0, API v1.0, Component v1.2.3) MCA iof: svc (MCA v1.0, API v1.0, Component v1.2.3) MCA ns: proxy (MCA v1.0, API v2.0, Component v1.2.3) MCA ns: replica (MCA v1.0, API v2.0, Component v1.2.3) MCA oob: tcp (MCA v1.0, API v1.0, Component v1.0) MCA ras: dash_host (MCA v1.0, API v1.3, Component v1.2.3) MCA ras: gridengine (MCA v1.0, API v1.3, Component v1.2.3) MCA ras: localhost (MCA v1.0, API v1.3, Component v1.2.3) MCA ras: slurm (MCA v1.0, API v1.3, Component v1.2.3) MCA ras: tm (MCA v1.0, API v1.3, Component v1.2.3) MCA rds: hostfile (MCA v1.0, API v1.3, Component v1.2.3) MCA rds: proxy (MCA v1.0, API v1.3, Component v1.2.3) MCA rds: resfile (MCA v1.0, API v1.3, Component v1.2.3) MCA rmaps: round_robin (MCA v1.0, API v1.3, Component v1.2.3) MCA rmgr: proxy (MCA v1.0, API v2.0, Component v1.2.3) MCA rmgr: urm (MCA v1.0, API v2.0, Component v1.2.3) MCA rml: oob (MCA v1.0, API v1.0, Component v1.2.3) MCA pls: gridengine (MCA v1.0, API v1.3, Component v1.2.3) MCA pls: proxy (MCA v1.0, API v1.3, Component v1.2.3) MCA pls: rsh (MCA v1.0, API v1.3, Component v1.2.3) MCA pls: slurm (MCA v1.0, API v1.3, Component v1.2.3) MCA pls: tm (MCA v1.0, API v1.3, Component v1.2.3) MCA sds: env (MCA v1.0, API v1.0, Component v1.2.3) MCA sds: pipe (MCA v1.0, API v1.0, Component v1.2.3) MCA sds: seed (MCA v1.0, API v1.0, Component v1.2.3) MCA sds: singleton (MCA v1.0, API v1.0, Component v1.2.3) MCA sds: slurm (MCA v1.0, API v1.0, Component v1.2.3) _______________________________________________ users mailing list us...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/users-- Gleb. _______________________________________________ users mailing list us...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/users
<<attachment: pjesa.vcf>>