Hi Bill,

On 08-Jul-11 7:59 PM, Bill Johnstone wrote:
> Hello, and thanks for the reply.
> 
> 
> 
> ----- Original Message -----
>> From: Jeff Squyres<jsquy...@cisco.com>
>> Sent: Thursday, July 7, 2011 5:14 PM
>> Subject: Re: [OMPI users] InfiniBand, different OpenFabrics transport types
>>
>> On Jun 28, 2011, at 1:46 PM, Bill Johnstone wrote:
>>
>>>   I have a heterogeneous network of InfiniBand-equipped hosts which are all
>> connected to the same backbone switch, an older SDR 10 Gb/s unit.
>>>
>>>   One set of nodes uses the Mellanox "ib_mthca" driver, while the
>> other uses the "mlx4" driver.
>>>
>>>   This is on Linux 2.6.32, with Open MPI 1.5.3 .
>>>
>>>   When I run Open MPI across these node types, I get an error message of the
>> form:
>>>
>>>   Open MPI detected two different OpenFabrics transport types in the same
>> Infiniband network.
>>>   Such mixed network trasport configuration is not supported by Open MPI.
>>>
>>>   Local host: compute-chassis-1-node-01
>>>   Local adapter: mthca0 (vendor 0x5ad, part ID 25208)
>>>   Local transport type: MCA_BTL_OPENIB_TRANSPORT_UNKNOWN
>>
>> Wow, that's cool ("UNKNOWN").  Are you using an old version of
>> OFED or something?
> 
> No, clean local build of OFED 1.5.3 packages, but I don't have the full huge 
> complement of OFED packages installed, since our setup is not using IPoIB, 
> SDP, etc.
> 
> ibdiagnet, and all the usual suspects work as expected, and I'm able to do 
> large scale Open MPI runs just fine, so long as I don't cross Mellanox HCA 
> types.
> 
> 
>> Mellanox -- how can this happen?
>>
>>>   Remote host: compute-chassis-3-node-01
>>>   Remote Adapter: (vendor 0x2c9, part ID 26428)
>>>   Remote transport type: MCA_BTL_OPENIB_TRANSPORT_IB
>>>
>>>   Two questions:
>>>
>>>   1. Why is this occurring if both adapters have all the OpenIB software set
>> up?  Is it because Open MPI is trying to use functionality such as ConnectX 
>> with
>> the newer hardware, which is incompatible with older hardware, or is it
>> something more mundane?
>>
>> It's basically a mismatch of IB capabilities -- Open MPI is trying to use
>> more advanced features in some nodes and not in others.
> 
> I also tried looking in the adapter-specific settings in the .ini file under 
> /etc, but the only difference I found was in MTU, and I think that's 
> configured on the switch.
> 
>>>   2. How can I use IB amongst these heterogeneous nodes?
>>
>> Mellanox will need to answer this question...  It might be able to be done, 
>> but
>> I don't know how offhand.  The first issue is to figure out why you're
>> getting TRANSPORT_UNKNOWN on the one node.
> 
> OK, please let me know what other things to try or what other info I can 
> provide.

I'll check the MCA_BTL_OPENIB_TRANSPORT_UNKNOWN thing and get back to you.
One question though, just to make sure we're on the same page: so the jobs do 
run OK on
the older HCAs, as long as they run *only* on the older HCAs, right?
Please make sure that the jobs are using only IB with "--mca btl openib,self" 
parameters.

-- YK

> _______________________________________________
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users
> 

Reply via email to