Sergei,

is there any reason why you configure with --with-verbs-libdir=/usr/lib ?
as far as i understand, --with-verbs should be enough, and /usr/lib
nor /usr/local/lib should ever be used in the configure command line
(and btw, are you running on a 32 bits system ? should the 64 bits
libs be in /usr/lib64 ?)

make sure you
ulimit -l unlimited
before you invoke mpirun, and this value is correctly propagated to
the remote nodes
/* the failure could be a side effect of a low ulimit -l */

Cheers,

Gilles


On Fri, Oct 28, 2016 at 6:48 PM, Sergei Hrushev <hrus...@gmail.com> wrote:
> Hello, All !
>
> We have a problem with OpenMPI version 1.10.2 on a cluster with newly
> installed Mellanox InfiniBand adapters.
> OpenMPI was re-configured and re-compiled using: --with-verbs
> --with-verbs-libdir=/usr/lib
>
> And our test MPI task returns proper results but it seems OpenMPI continues
> to use existing 1Gbit Ethernet network instead of InfiniBand.
>
> An output file contains these lines:
> --------------------------------------------------------------------------
> No OpenFabrics connection schemes reported that they were able to be
> used on a specific port.  As such, the openib BTL (OpenFabrics
> support) will be disabled for this port.
>
>   Local host:           node1
>   Local device:         mlx4_0
>   Local port:           1
>   CPCs attempted:       rdmacm, udcm
> --------------------------------------------------------------------------
>
> InfiniBand network itself seems to be working:
>
> $ ibstat mlx4_0 shows:
>
> CA 'mlx4_0'
>         CA type: MT4099
>         Number of ports: 1
>         Firmware version: 2.35.5100
>         Hardware version: 0
>         Node GUID: 0x7cfe900300bddec0
>         System image GUID: 0x7cfe900300bddec3
>         Port 1:
>                 State: Active
>                 Physical state: LinkUp
>                 Rate: 56
>                 Base lid: 3
>                 LMC: 0
>                 SM lid: 3
>                 Capability mask: 0x0251486a
>                 Port GUID: 0x7cfe900300bddec1
>                 Link layer: InfiniBand
>
> ibping also works.
> ibnetdiscover shows the correct topology of  IB network.
>
> Cluster works under Ubuntu 16.04 and we use drivers from OS (OFED is not
> installed).
>
> Is it enough for OpenMPI to have RDMA only or IPoIB should also be
> installed?
> What else can be checked?
>
> Thanks a lot for any help!
>
>
> _______________________________________________
> users mailing list
> users@lists.open-mpi.org
> https://rfd.newmexicoconsortium.org/mailman/listinfo/users
_______________________________________________
users mailing list
users@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/users

Reply via email to