Hello, All !

We have a problem with OpenMPI version 1.10.2 on a cluster with newly
installed Mellanox InfiniBand adapters.
OpenMPI was re-configured and re-compiled using: --with-verbs
--with-verbs-libdir=/usr/lib

And our test MPI task returns proper results but it seems OpenMPI continues
to use existing 1Gbit Ethernet network instead of InfiniBand.

An output file contains these lines:
--------------------------------------------------------------------------
No OpenFabrics connection schemes reported that they were able to be
used on a specific port.  As such, the openib BTL (OpenFabrics
support) will be disabled for this port.

  Local host:           node1
  Local device:         mlx4_0
  Local port:           1
  CPCs attempted:       rdmacm, udcm
--------------------------------------------------------------------------

InfiniBand network itself seems to be working:

$ ibstat mlx4_0 shows:

CA 'mlx4_0'
        CA type: MT4099
        Number of ports: 1
        Firmware version: 2.35.5100
        Hardware version: 0
        Node GUID: 0x7cfe900300bddec0
        System image GUID: 0x7cfe900300bddec3
        Port 1:
                State: Active
                Physical state: LinkUp
                Rate: 56
                Base lid: 3
                LMC: 0
                SM lid: 3
                Capability mask: 0x0251486a
                Port GUID: 0x7cfe900300bddec1
                Link layer: InfiniBand

ibping also works.
ibnetdiscover shows the correct topology of  IB network.

Cluster works under Ubuntu 16.04 and we use drivers from OS (OFED is not
installed).

Is it enough for OpenMPI to have RDMA only or IPoIB should also be
installed?
What else can be checked?

Thanks a lot for any help!
_______________________________________________
users mailing list
users@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/users

Reply via email to