Egor,

If updating OFED doesn't solve the problem (and I kinda have the
feeling that it does), you might want to try this mailing list
for IB interoperability questions:
linux-r...@vger.kernel.org

-- YK

On 26-Aug-11 4:42 PM, Shamis, Pavel wrote:
> You may try to update your OFED version. I think 1.5.3 is the latest one.
> 
> Pavel (Pasha) Shamis
> ---
> Application Performance Tools Group
> Computer Science and Math Division
> Oak Ridge National Laboratory
> 
> 
> 
> 
> 
> 
> On Aug 25, 2011, at 7:46 PM,<worl...@ukr.net>  <worl...@ukr.net>  wrote:
> 
>>
>> Hi all,
>>
>> it is more hardware or system configuration question but
>> I hope people in this list have an experience.
>> I have just added new ConnectX IB card to cluster with InfiniHost cards.
>> And no mpi programs work. Even ofed's tests do not work.
>> For example ib_send_*, ib_write_* just segfault on the host with ConnectX 
>> card and
>> still wait on the hosts with InfiniHost card. rdma_lat/bw tests segfault too 
>> but
>> with messages on the InfiniHost card hosts like this:
>> server read: No such file or directory
>> 5924:pp_server_exch_dest: 0/45 Couldn't read remote address
>>
>> pp_read_keys: No such file or directory
>> Couldn't read remote address
>>
>> Other diagnostic tools like ibv_device, ibchecknet, ibstat, ibstatus... show 
>> no errors
>> and show ConnectX card in system. All modules (mlx4_*, rdma_*) loaded. IPoIB 
>> configured.
>> openibd, opensmd services started without errors.
>>
>> 08:00.0 InfiniBand: Mellanox Technologies MT26428 [ConnectX VPI PCIe 2.0 
>> 5GT/s - IB QDR / 10GigE] (rev a0)
>> OFED is 1.3.1, CentOS 5.2.
>>
>> ibstat
>> CA 'mlx4_0'
>>         CA type: MT26428
>>         Number of ports: 1
>>         Firmware version: 2.7.0
>>         Hardware version: a0
>>         Node GUID: 0x0002c903000cad14
>>         System image GUID: 0x0002c903000cad17
>>         Port 1:
>>                 State: Active
>>                 Physical state: LinkUp
>>                 Rate: 20
>>                 Base lid: 60
>>                 LMC: 0
>>                 SM lid: 60
>>                 Capability mask: 0x0251086a
>>                 Port GUID: 0x0002c903000cad15
>>
>> Where is a problem?
>>
>> Thanx in advance,
>> Egor.
>> _______________________________________________
>> users mailing list
>> us...@open-mpi.org
>> hxxp://www.open-mpi.org/mailman/listinfo.cgi/users
>>
> 
> 
> _______________________________________________
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users
> 

Reply via email to