Ahhh, that's the piece I was missing.  I've been trying to debug everything I 
could think of related to 'btl', and was completely unaware that 'mtl' was also 
a transport.

If I run a job using --mca mtl ^psm, it does indeed run properly across all of 
my nodes.  (Whether or not that's the 'right' thing to do is yet to be 
determined.)

Thanks for your help!

Kevin


-----Original Message-----
From: users [mailto:users-boun...@open-mpi.org] On Behalf Of Dave Love
Sent: Tuesday, October 15, 2013 10:16 AM
To: Open MPI Users
Subject: Re: [OMPI users] Need help running jobs across different IB vendors

"Kevin M. Hildebrand" <ke...@umd.edu> writes:

> Hi, I'm trying to run an OpenMPI 1.6.5 job across a set of nodes, some
> with Mellanox cards and some with Qlogic cards.

Maybe you shouldn't...  (I'm blessed in one cluster with three somewhat
incompatible types of QLogic card and a set of Mellanox ones, but
they're in separate islands, apart from the two different SDR ones.)

> I'm getting errors indicating "At least one pair of MPI processes are unable 
> to reach each other for MPI communications".  As far as I can tell all of the 
> nodes are properly configured and able to reach each other, via IP and non-IP 
> connections.
> I've also discovered that even if I turn off the IB transport via "--mca btl 
> tcp,self" I'm still getting the same issue.
> The test works fine if I run it confined to hosts with identical IB cards.
> I'd appreciate some assistance in figuring out what I'm doing wrong.

I assume the QLogic cards are using PSM.  You'd need to force them to
use openib with something like --mca mtl ^psm and make sure they have
the ipathverbs library available.  You probably won't like the resulting
performance -- users here noticed when one set fell back to openib from
psm recently.
_______________________________________________
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users

Reply via email to