Hi, I'm a system administrator trying to help users resolve gadget 2 code hangs doing MPI_Sendrecv (similar to http://www.open-mpi.org/community/lists/users/2010/05/13057.php). I'm trying to determine appropriate values for mpool_rdma_rcache_size_limit for our hardware, and to make sure RDMA settings are appropriate and do not lead to data corruption ( http://www.open-mpi.org/faq/?category=openfabrics#setting-mpi-leave-pinned-1.3.2 ). The gadget code was running fine under openmpi 1.2.9 and the hangs showed up in 1.4.3 (actually also 1.3.2).
code runs using tcp (-mca btl tcp,self,sm) code hangs using infiniband code runs using infiniband with "-mca btl_openib_flags 1" and "-mca mpool_rdma_rcache_size_limit 209715200" (suggestion from poster from the referenced link above) Any suggestions would be appreciated. Regards, Gretchen 0. openmpi 1.4.3 (ompi_info attached, config.log is missing but may not be needed as this is a more general usage/settings question) 1. OFED 1.4.2 from git.openfabrics.org 2. Debian 5.0, kernel 2.6.26-2-amd64 3. opensm-3.2.6 4. ibv_devinfo hca_id: mlx4_0 fw_ver: 2.6.000 node_guid: 0002:c903:0002:848c sys_image_guid: 0002:c903:0002:848f vendor_id: 0x02c9 vendor_part_id: 25408 hw_ver: 0xA0 board_id: MT_04A0130005 phys_port_cnt: 2 port: 1 state: PORT_ACTIVE (4) max_mtu: 2048 (4) active_mtu: 2048 (4) sm_lid: 30 port_lid: 99 port_lmc: 0x00 5. ifconfig ib0 Link encap:UNSPEC HWaddr 80-00-00-48-FE-80-00-00-00-00-00-00-00-00-00-00 inet addr:10.16.10.20 Bcast:10.16.10.255 Mask:255.255.255.0 inet6 addr: fe80::202:c903:2:848d/64 Scope:Link UP BROADCAST RUNNING MULTICAST MTU:65520 Metric:1 RX packets:1936 errors:0 dropped:0 overruns:0 frame:0 TX packets:0 errors:0 dropped:5 overruns:0 carrier:0 collisions:0 txqueuelen:256 RX bytes:189055 (184.6 KiB) TX bytes:0 (0.0 B) 6. unlimited
Package: Open MPI xxx@xxx Distribution Open MPI: 1.4.3 Open MPI SVN revision: r23834 Open MPI release date: Oct 05, 2010 Open RTE: 1.4.3 Open RTE SVN revision: r23834 Open RTE release date: Oct 05, 2010 OPAL: 1.4.3 OPAL SVN revision: r23834 OPAL release date: Oct 05, 2010 Ident string: 1.4.3 Prefix: /usr/local/openmpi-1.4.3 Configured architecture: x86_64-unknown-linux-gnu Configure host: xxx Configured by: xxx Configured on: Tue Nov 30 16:24:27 EST 2010 Configure host: xxx Built by: xxx Built on: Tue Nov 30 16:31:33 EST 2010 Built host: xxx C bindings: yes C++ bindings: yes Fortran77 bindings: yes (all) Fortran90 bindings: yes Fortran90 bindings size: small C compiler: gcc C compiler absolute: /usr/bin/gcc C++ compiler: g++ C++ compiler absolute: /usr/bin/g++ Fortran77 compiler: gfortran Fortran77 compiler abs: /usr/bin/gfortran Fortran90 compiler: gfortran Fortran90 compiler abs: /usr/bin/gfortran C profiling: yes C++ profiling: yes Fortran77 profiling: yes Fortran90 profiling: yes C++ exceptions: yes Thread support: posix (mpi: no, progress: no) Sparse Groups: no Internal debug support: no MPI parameter check: runtime Memory profiling support: no Memory debugging support: no libltdl support: yes Heterogeneous support: no mpirun default --prefix: no MPI I/O support: yes MPI_WTIME support: gettimeofday Symbol visibility support: yes FT Checkpoint support: no (checkpoint thread: no) MCA backtrace: execinfo (MCA v2.0, API v2.0, Component v1.4.3) MCA memory: ptmalloc2 (MCA v2.0, API v2.0, Component v1.4.3) MCA paffinity: linux (MCA v2.0, API v2.0, Component v1.4.3) MCA carto: auto_detect (MCA v2.0, API v2.0, Component v1.4.3) MCA carto: file (MCA v2.0, API v2.0, Component v1.4.3) MCA maffinity: first_use (MCA v2.0, API v2.0, Component v1.4.3) MCA maffinity: libnuma (MCA v2.0, API v2.0, Component v1.4.3) MCA timer: linux (MCA v2.0, API v2.0, Component v1.4.3) MCA installdirs: env (MCA v2.0, API v2.0, Component v1.4.3) MCA installdirs: config (MCA v2.0, API v2.0, Component v1.4.3) MCA dpm: orte (MCA v2.0, API v2.0, Component v1.4.3) MCA pubsub: orte (MCA v2.0, API v2.0, Component v1.4.3) MCA allocator: basic (MCA v2.0, API v2.0, Component v1.4.3) MCA allocator: bucket (MCA v2.0, API v2.0, Component v1.4.3) MCA coll: basic (MCA v2.0, API v2.0, Component v1.4.3) MCA coll: hierarch (MCA v2.0, API v2.0, Component v1.4.3) MCA coll: inter (MCA v2.0, API v2.0, Component v1.4.3) MCA coll: self (MCA v2.0, API v2.0, Component v1.4.3) MCA coll: sm (MCA v2.0, API v2.0, Component v1.4.3) MCA coll: sync (MCA v2.0, API v2.0, Component v1.4.3) MCA coll: tuned (MCA v2.0, API v2.0, Component v1.4.3) MCA io: romio (MCA v2.0, API v2.0, Component v1.4.3) MCA mpool: fake (MCA v2.0, API v2.0, Component v1.4.3) MCA mpool: rdma (MCA v2.0, API v2.0, Component v1.4.3) MCA mpool: sm (MCA v2.0, API v2.0, Component v1.4.3) MCA pml: cm (MCA v2.0, API v2.0, Component v1.4.3) MCA pml: csum (MCA v2.0, API v2.0, Component v1.4.3) MCA pml: ob1 (MCA v2.0, API v2.0, Component v1.4.3) MCA pml: v (MCA v2.0, API v2.0, Component v1.4.3) MCA bml: r2 (MCA v2.0, API v2.0, Component v1.4.3) MCA rcache: vma (MCA v2.0, API v2.0, Component v1.4.3) MCA btl: ofud (MCA v2.0, API v2.0, Component v1.4.3) MCA btl: openib (MCA v2.0, API v2.0, Component v1.4.3) MCA btl: self (MCA v2.0, API v2.0, Component v1.4.3) MCA btl: sm (MCA v2.0, API v2.0, Component v1.4.3) MCA btl: tcp (MCA v2.0, API v2.0, Component v1.4.3) MCA topo: unity (MCA v2.0, API v2.0, Component v1.4.3) MCA osc: pt2pt (MCA v2.0, API v2.0, Component v1.4.3) MCA osc: rdma (MCA v2.0, API v2.0, Component v1.4.3) MCA iof: hnp (MCA v2.0, API v2.0, Component v1.4.3) MCA iof: orted (MCA v2.0, API v2.0, Component v1.4.3) MCA iof: tool (MCA v2.0, API v2.0, Component v1.4.3) MCA oob: tcp (MCA v2.0, API v2.0, Component v1.4.3) MCA odls: default (MCA v2.0, API v2.0, Component v1.4.3) MCA ras: slurm (MCA v2.0, API v2.0, Component v1.4.3) MCA ras: tm (MCA v2.0, API v2.0, Component v1.4.3) MCA rmaps: load_balance (MCA v2.0, API v2.0, Component v1.4.3) MCA rmaps: rank_file (MCA v2.0, API v2.0, Component v1.4.3) MCA rmaps: round_robin (MCA v2.0, API v2.0, Component v1.4.3) MCA rmaps: seq (MCA v2.0, API v2.0, Component v1.4.3) MCA rml: oob (MCA v2.0, API v2.0, Component v1.4.3) MCA routed: binomial (MCA v2.0, API v2.0, Component v1.4.3) MCA routed: direct (MCA v2.0, API v2.0, Component v1.4.3) MCA routed: linear (MCA v2.0, API v2.0, Component v1.4.3) MCA plm: rsh (MCA v2.0, API v2.0, Component v1.4.3) MCA plm: slurm (MCA v2.0, API v2.0, Component v1.4.3) MCA plm: tm (MCA v2.0, API v2.0, Component v1.4.3) MCA filem: rsh (MCA v2.0, API v2.0, Component v1.4.3) MCA errmgr: default (MCA v2.0, API v2.0, Component v1.4.3) MCA ess: env (MCA v2.0, API v2.0, Component v1.4.3) MCA ess: hnp (MCA v2.0, API v2.0, Component v1.4.3) MCA ess: singleton (MCA v2.0, API v2.0, Component v1.4.3) MCA ess: slurm (MCA v2.0, API v2.0, Component v1.4.3) MCA ess: tool (MCA v2.0, API v2.0, Component v1.4.3) MCA grpcomm: bad (MCA v2.0, API v2.0, Component v1.4.3) MCA grpcomm: basic (MCA v2.0, API v2.0, Component v1.4.3)