Per the error message, can you try to

mpirun --mca btl_openib_if_include cxgb3_0 --mca btl_openib_max_send_size 65536 ...

and see whether it helps ?

you can also try various settings for the receive queue, for example edit your /.../share/openmpi/mca-btl-openib-device-params.ini and set the parameters for your specific hardware

Cheers,

Gilles

On 3/8/2016 2:55 PM, dpchoudh . wrote:
Hello all

I am asking for help for the following situation:

I have two (mostly identical) nodes. Each of them have (completely identical)
1. qlogic 4x DDR infiniband, AND
2. Chelsio S310E (T3 chip based) 10GE iWARP cards.

Both are connected back-to-back, without a switch. The connection is physically OK and IP traffic can flow on both of them without issues.

The issue is, I can run MPI programs using the openib BTL using the qlogic card, but not the Chelsio card. Here are the commands:

[durga@smallMPI ~]$ ibv_devices
    device                 node GUID
    ------              ----------------
    cxgb3_0             00074306cd3b0000      <-- Chelsio
    qib0                0011750000ff831d <-- Qlogic

The following command works:

mpirun -np 2 --hostfile ~/hostfile -mca btl_openib_if_include qib0 ./osu_acc_latency

And the following do not:
mpirun -np 2 --hostfile ~/hostfile -mca btl_openib_if_include cxgb3_0 ./osu_acc_latency

mpirun -np 2 --hostfile ~/hostfile -mca pml ob1 -mca btl_openib_if_include cxgb3_0 ./osu_acc_latency

mpirun -np 2 --hostfile ~/hostfile -mca pml ^cm -mca btl_openib_if_include cxgb3_0 ./osu_acc_latency

The error I get is the following (in all of the non-working cases):

WARNING: The largest queue pair buffer size specified in the
btl_openib_receive_queues MCA parameter is smaller than the maximum
send size (i.e., the btl_openib_max_send_size MCA parameter), meaning
that no queue is large enough to receive the largest possible incoming
message fragment.  The OpenFabrics (openib) BTL will therefore be
deactivated for this run.

  Local host: smallMPI
  Largest buffer size: 65536
  Maximum send fragment size: 131072
--------------------------------------------------------------------------
--------------------------------------------------------------------------
No OpenFabrics connection schemes reported that they were able to be
used on a specific port.  As such, the openib BTL (OpenFabrics
support) will be disabled for this port.

  Local host:           bigMPI
  Local device:         cxgb3_0
  Local port:           1
  CPCs attempted:       udcm
--------------------------------------------------------------------------

I have a vague understanding of what the message is trying to say, but I do not know which file or configuration parameters to change to fix the situation.

Thanks in advance
Durga


Life is complex. It has real and imaginary parts.


_______________________________________________
users mailing list
us...@open-mpi.org
Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
Link to this post: 
http://www.open-mpi.org/community/lists/users/2016/03/28657.php

Reply via email to