See https://github.com/open-mpi/ompi/pull/1439

I was seeing this problem when enabling CUDA support as it sets
btl_openib_max_send_size to 128k but does not change the receive queue
settings. Tested the commit in #1439 and it fixes the issue for me.

-Nathan

On Tue, Mar 08, 2016 at 03:57:39PM +0900, Gilles Gouaillardet wrote:
>    Per the error message, can you try to
> 
>    mpirun --mca btl_openib_if_include cxgb3_0 --mca btl_openib_max_send_size
>    65536 ...
> 
>    and see whether it helps ?
> 
>    you can also try various settings for the receive queue, for example edit
>    your /.../share/openmpi/mca-btl-openib-device-params.ini and set the
>    parameters for your specific hardware
> 
>    Cheers,
> 
>    Gilles
> 
>    On 3/8/2016 2:55 PM, dpchoudh . wrote:
> 
>      Hello all
> 
>      I am asking for help for the following situation:
> 
>      I have two (mostly identical) nodes. Each of them have (completely
>      identical)
>      1. qlogic 4x DDR infiniband, AND
>      2. Chelsio S310E (T3 chip based) 10GE iWARP cards.
> 
>      Both are connected back-to-back, without a switch. The connection is
>      physically OK and IP traffic can flow on both of them without issues.
> 
>      The issue is, I can run MPI programs using the openib BTL using the
>      qlogic card, but not the Chelsio card. Here are the commands:
> 
>      [durga@smallMPI ~]$ ibv_devices
>          device                 node GUID
>          ------              ----------------
>          cxgb3_0             00074306cd3b0000      <-- Chelsio
>          qib0                0011750000ff831d           <-- Qlogic
> 
>      The following command works:
> 
>       mpirun -np 2 --hostfile ~/hostfile -mca btl_openib_if_include qib0
>      ./osu_acc_latency
> 
>      And the following do not:
>      mpirun -np 2 --hostfile ~/hostfile -mca btl_openib_if_include cxgb3_0
>      ./osu_acc_latency
> 
>      mpirun -np 2 --hostfile ~/hostfile -mca pml ob1 -mca
>      btl_openib_if_include cxgb3_0 ./osu_acc_latency
> 
>      mpirun -np 2 --hostfile ~/hostfile -mca pml ^cm -mca
>      btl_openib_if_include cxgb3_0 ./osu_acc_latency
> 
>      The error I get is the following (in all of the non-working cases):
> 
>      WARNING: The largest queue pair buffer size specified in the
>      btl_openib_receive_queues MCA parameter is smaller than the maximum
>      send size (i.e., the btl_openib_max_send_size MCA parameter), meaning
>      that no queue is large enough to receive the largest possible incoming
>      message fragment.  The OpenFabrics (openib) BTL will therefore be
>      deactivated for this run.
> 
>        Local host: smallMPI
>        Largest buffer size: 65536
>        Maximum send fragment size: 131072
>      
> --------------------------------------------------------------------------
>      
> --------------------------------------------------------------------------
>      No OpenFabrics connection schemes reported that they were able to be
>      used on a specific port.  As such, the openib BTL (OpenFabrics
>      support) will be disabled for this port.
> 
>        Local host:           bigMPI
>        Local device:         cxgb3_0
>        Local port:           1
>        CPCs attempted:       udcm
>      
> --------------------------------------------------------------------------
> 
>      I have a vague understanding of what the message is trying to say, but I
>      do not know which file or configuration parameters to change to fix the
>      situation.
> 
>      Thanks in advance
>      Durga
>      Life is complex. It has real and imaginary parts.
> 
>  _______________________________________________
>  users mailing list
>  us...@open-mpi.org
>  Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
>  Link to this post: 
> http://www.open-mpi.org/community/lists/users/2016/03/28657.php

> _______________________________________________
> users mailing list
> us...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
> Link to this post: 
> http://www.open-mpi.org/community/lists/users/2016/03/28658.php

Attachment: pgpP5SD5OhdXZ.pgp
Description: PGP signature

Reply via email to