Hi Bill, On 08-Jul-11 7:59 PM, Bill Johnstone wrote: > Hello, and thanks for the reply. > > > > ----- Original Message ----- >> From: Jeff Squyres<jsquy...@cisco.com> >> Sent: Thursday, July 7, 2011 5:14 PM >> Subject: Re: [OMPI users] InfiniBand, different OpenFabrics transport types >> >> On Jun 28, 2011, at 1:46 PM, Bill Johnstone wrote: >> >>> I have a heterogeneous network of InfiniBand-equipped hosts which are all >> connected to the same backbone switch, an older SDR 10 Gb/s unit. >>> >>> One set of nodes uses the Mellanox "ib_mthca" driver, while the >> other uses the "mlx4" driver. >>> >>> This is on Linux 2.6.32, with Open MPI 1.5.3 . >>> >>> When I run Open MPI across these node types, I get an error message of the >> form: >>> >>> Open MPI detected two different OpenFabrics transport types in the same >> Infiniband network. >>> Such mixed network trasport configuration is not supported by Open MPI. >>> >>> Local host: compute-chassis-1-node-01 >>> Local adapter: mthca0 (vendor 0x5ad, part ID 25208) >>> Local transport type: MCA_BTL_OPENIB_TRANSPORT_UNKNOWN >> >> Wow, that's cool ("UNKNOWN"). Are you using an old version of >> OFED or something? > > No, clean local build of OFED 1.5.3 packages, but I don't have the full huge > complement of OFED packages installed, since our setup is not using IPoIB, > SDP, etc. > > ibdiagnet, and all the usual suspects work as expected, and I'm able to do > large scale Open MPI runs just fine, so long as I don't cross Mellanox HCA > types. > > >> Mellanox -- how can this happen? >> >>> Remote host: compute-chassis-3-node-01 >>> Remote Adapter: (vendor 0x2c9, part ID 26428) >>> Remote transport type: MCA_BTL_OPENIB_TRANSPORT_IB >>> >>> Two questions: >>> >>> 1. Why is this occurring if both adapters have all the OpenIB software set >> up? Is it because Open MPI is trying to use functionality such as ConnectX >> with >> the newer hardware, which is incompatible with older hardware, or is it >> something more mundane? >> >> It's basically a mismatch of IB capabilities -- Open MPI is trying to use >> more advanced features in some nodes and not in others. > > I also tried looking in the adapter-specific settings in the .ini file under > /etc, but the only difference I found was in MTU, and I think that's > configured on the switch. > >>> 2. How can I use IB amongst these heterogeneous nodes? >> >> Mellanox will need to answer this question... It might be able to be done, >> but >> I don't know how offhand. The first issue is to figure out why you're >> getting TRANSPORT_UNKNOWN on the one node. > > OK, please let me know what other things to try or what other info I can > provide.
I'll check the MCA_BTL_OPENIB_TRANSPORT_UNKNOWN thing and get back to you. One question though, just to make sure we're on the same page: so the jobs do run OK on the older HCAs, as long as they run *only* on the older HCAs, right? Please make sure that the jobs are using only IB with "--mca btl openib,self" parameters. -- YK > _______________________________________________ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users >