Hi: I recently tried to build my MPI application against OpenMPI 1.3.3. It worked fine with OMPI 1.2.9, but with OMPI 1.3.3, it hangs part way through. It does a fair amount of comm, but eventually it stops in a Send/Recv point-to-point exchange. If I turn off the openib btl, it runs to completion. Also, I built 1.3.3 with memchecker (which is very nice; thanks to everyone who worked on that!) and it runs to completion, even with openib active.
Our cluster consists of dual dual-core opteron boxes with Mellanox MT25204 (InfiniHost III Lx) HCAs and a Mellanox MT47396 Infiniscale-III switch. We're running RHEL 4.8 which appears to include OFED 1.4. I've built everything using GCC 4.3.2. Here is the output from ibv_devinfo. "ompi_info --all" is attached. $ ibv_devinfo hca_id: mthca0 fw_ver: 1.1.0 node_guid: 0002:c902:0024:3284 sys_image_guid: 0002:c902:0024:3287 vendor_id: 0x02c9 vendor_part_id: 25204 hw_ver: 0xA0 board_id: MT_03B0140002 phys_port_cnt: 1 port: 1 state: active (4) max_mtu: 2048 (4) active_mtu: 2048 (4) sm_lid: 1 port_lid: 1 port_lmc: 0x00 I'd appreciate any tips for debugging this. Thanks, Allen -- Allen Barnett Transpire, Inc E-Mail: al...@transpireinc.com Skype: allenbarnett Ph: 518-887-2930
ompinfo.gz
Description: GNU Zip compressed data