Alternatively, if OpenMPI is really trying to use both ports, you could force it to use just one port with --mca btl_openib_if_include mlx4_0:1 (probably)
-- Mike Shuey On Tue, Mar 1, 2011 at 1:02 PM, Jeff Squyres <jsquy...@cisco.com> wrote: > On Feb 28, 2011, at 12:49 PM, Jagga Soorma wrote: > >> -bash-3.2$ mpiexec --mca btl openib,self -mca btl_openib_warn_default_gid_ >> prefix 0 -np 2 --hostfile mpihosts >> /home/jagga/osu-micro-benchmarks-3.3/openmpi/ofed-1.5.2/bin/osu_latency > > Your use of btl_openib_warn_default_gid_prefix may have brought up a subtle > issue in Open MPI's verbs support. More below. > >> # OSU MPI Latency Test v3.3 >> # Size Latency (us) >> [amber04][[10252,1],1][connect/btl_openib_connect_oob.c:325:qp_connect_all] >> error modifing QP to RTR errno says Invalid argument >> [amber04][[10252,1],1][connect/btl_openib_connect_oob.c:815:rml_recv_cb] >> error in endpoint reply start connect > > Looking at this error message and your ibv_devinfo output: > >> [root@amber03 ~]# ibv_devinfo >> hca_id: mlx4_0 >> transport: InfiniBand (0) >> fw_ver: 2.7.9294 >> node_guid: 78e7:d103:0021:8884 >> sys_image_guid: 78e7:d103:0021:8887 >> vendor_id: 0x02c9 >> vendor_part_id: 26438 >> hw_ver: 0xB0 >> board_id: HP_0200000003 >> phys_port_cnt: 2 >> port: 1 >> state: PORT_ACTIVE (4) >> max_mtu: 2048 (4) >> active_mtu: 2048 (4) >> sm_lid: 1 >> port_lid: 20 >> port_lmc: 0x00 >> link_layer: IB >> >> port: 2 >> state: PORT_ACTIVE (4) >> max_mtu: 2048 (4) >> active_mtu: 1024 (3) >> sm_lid: 0 >> port_lid: 0 >> port_lmc: 0x00 >> link_layer: Ethernet > > It looks like you have 1 HCA port as IB and the other at Ethernet. > > I'm wondering if OMPI is not taking the device transport into account and is > *only* using the subnet ID to determine reachability (i.e., I'm wondering if > we didn't anticipate multiple devices/ports with the same subnet ID but with > different transports). I pointed this out to Mellanox yesterday; I think > they're following up on it. > > In the meantime, a workaround might be to set a non-default subnet ID on your > IB network. That should allow Open MPI to tell these networks apart without > additional help. > > -- > Jeff Squyres > jsquy...@cisco.com > For corporate legal information go to: > http://www.cisco.com/web/about/doing_business/legal/cri/ > > > _______________________________________________ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users >