Ahhh, that's the piece I was missing. I've been trying to debug everything I could think of related to 'btl', and was completely unaware that 'mtl' was also a transport.
If I run a job using --mca mtl ^psm, it does indeed run properly across all of my nodes. (Whether or not that's the 'right' thing to do is yet to be determined.) Thanks for your help! Kevin -----Original Message----- From: users [mailto:users-boun...@open-mpi.org] On Behalf Of Dave Love Sent: Tuesday, October 15, 2013 10:16 AM To: Open MPI Users Subject: Re: [OMPI users] Need help running jobs across different IB vendors "Kevin M. Hildebrand" <ke...@umd.edu> writes: > Hi, I'm trying to run an OpenMPI 1.6.5 job across a set of nodes, some > with Mellanox cards and some with Qlogic cards. Maybe you shouldn't... (I'm blessed in one cluster with three somewhat incompatible types of QLogic card and a set of Mellanox ones, but they're in separate islands, apart from the two different SDR ones.) > I'm getting errors indicating "At least one pair of MPI processes are unable > to reach each other for MPI communications". As far as I can tell all of the > nodes are properly configured and able to reach each other, via IP and non-IP > connections. > I've also discovered that even if I turn off the IB transport via "--mca btl > tcp,self" I'm still getting the same issue. > The test works fine if I run it confined to hosts with identical IB cards. > I'd appreciate some assistance in figuring out what I'm doing wrong. I assume the QLogic cards are using PSM. You'd need to force them to use openib with something like --mca mtl ^psm and make sure they have the ipathverbs library available. You probably won't like the resulting performance -- users here noticed when one set fell back to openib from psm recently. _______________________________________________ users mailing list us...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/users