See https://github.com/open-mpi/ompi/pull/1439
I was seeing this problem when enabling CUDA support as it sets btl_openib_max_send_size to 128k but does not change the receive queue settings. Tested the commit in #1439 and it fixes the issue for me. -Nathan On Tue, Mar 08, 2016 at 03:57:39PM +0900, Gilles Gouaillardet wrote: > Per the error message, can you try to > > mpirun --mca btl_openib_if_include cxgb3_0 --mca btl_openib_max_send_size > 65536 ... > > and see whether it helps ? > > you can also try various settings for the receive queue, for example edit > your /.../share/openmpi/mca-btl-openib-device-params.ini and set the > parameters for your specific hardware > > Cheers, > > Gilles > > On 3/8/2016 2:55 PM, dpchoudh . wrote: > > Hello all > > I am asking for help for the following situation: > > I have two (mostly identical) nodes. Each of them have (completely > identical) > 1. qlogic 4x DDR infiniband, AND > 2. Chelsio S310E (T3 chip based) 10GE iWARP cards. > > Both are connected back-to-back, without a switch. The connection is > physically OK and IP traffic can flow on both of them without issues. > > The issue is, I can run MPI programs using the openib BTL using the > qlogic card, but not the Chelsio card. Here are the commands: > > [durga@smallMPI ~]$ ibv_devices > device node GUID > ------ ---------------- > cxgb3_0 00074306cd3b0000 <-- Chelsio > qib0 0011750000ff831d <-- Qlogic > > The following command works: > > mpirun -np 2 --hostfile ~/hostfile -mca btl_openib_if_include qib0 > ./osu_acc_latency > > And the following do not: > mpirun -np 2 --hostfile ~/hostfile -mca btl_openib_if_include cxgb3_0 > ./osu_acc_latency > > mpirun -np 2 --hostfile ~/hostfile -mca pml ob1 -mca > btl_openib_if_include cxgb3_0 ./osu_acc_latency > > mpirun -np 2 --hostfile ~/hostfile -mca pml ^cm -mca > btl_openib_if_include cxgb3_0 ./osu_acc_latency > > The error I get is the following (in all of the non-working cases): > > WARNING: The largest queue pair buffer size specified in the > btl_openib_receive_queues MCA parameter is smaller than the maximum > send size (i.e., the btl_openib_max_send_size MCA parameter), meaning > that no queue is large enough to receive the largest possible incoming > message fragment. The OpenFabrics (openib) BTL will therefore be > deactivated for this run. > > Local host: smallMPI > Largest buffer size: 65536 > Maximum send fragment size: 131072 > > -------------------------------------------------------------------------- > > -------------------------------------------------------------------------- > No OpenFabrics connection schemes reported that they were able to be > used on a specific port. As such, the openib BTL (OpenFabrics > support) will be disabled for this port. > > Local host: bigMPI > Local device: cxgb3_0 > Local port: 1 > CPCs attempted: udcm > > -------------------------------------------------------------------------- > > I have a vague understanding of what the message is trying to say, but I > do not know which file or configuration parameters to change to fix the > situation. > > Thanks in advance > Durga > Life is complex. It has real and imaginary parts. > > _______________________________________________ > users mailing list > us...@open-mpi.org > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users > Link to this post: > http://www.open-mpi.org/community/lists/users/2016/03/28657.php > _______________________________________________ > users mailing list > us...@open-mpi.org > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users > Link to this post: > http://www.open-mpi.org/community/lists/users/2016/03/28658.php
pgpP5SD5OhdXZ.pgp
Description: PGP signature