Yalla works because MXM defaults to using unconnected datagrams (I don’t think 
it uses RC unless you ask). Is this a fully connected algorithm? I ask because 
(3584 - 28) * 28 * 3 (default number of QPs/remote process in btl/openib) = 
298704 > 262144. This is the problem with RC. Mellanox solved it by adding 
another protocol on mlx5 systems called DC. The openib btl does not (and 
probably never will) support DC. The recommended path is to use OpenUCX. That 
is effectively the replacement for ibverbs in the long run.

-Nathan

> On Mar 13, 2018, at 8:43 PM, Ben Menadue <ben.mena...@nci.org.au> wrote:
> 
> Hi,
> 
> One of our users is having trouble scaling his code up to 3584 cores (i.e. 
> 128 28-core nodes). It runs fine on 1792 cores (64 nodes), but fails with 
> this at 3584:
> 
> --------------------------------------------------------------------------
> A process failed to create a queue pair. This usually means either
> the device has run out of queue pairs (too many connections) or
> there are insufficient resources available to allocate a queue pair
> (out of memory). The latter can happen if either 1) insufficient
> memory is available, or 2) no more physical memory can be registered
> with the device.
> For more information on memory registration see the Open MPI FAQs at:
> http://www.open-mpi.org/faq/?category=openfabrics#ib-locked-pages
> Local host: r3735
> Local device: mlx5_0
> Queue pair type: Reliable connected (RC)
> --------------------------------------------------------------------------
> 
> Looking on the node in question, sure enough there’s a message in dmesg:
> 
> [347071.005636] mlx5_core 0000:06:00.0: mlx5_cmd_check:727:(pid 31507): 
> CREATE_QP(0x500) op_mod(0x0) failed, status bad resource(0x5), syndrome 
> (0x65b500)
> 
> I’m pretty sure 0x65b500 means "out of queue pairs”.
> 
> Our HCAs support 262144 QPs, and while some of these will be used for e.g. 
> IPoIB and Lustre, I wouldn’t expect to be running out at such a low number of 
> cores — and indeed, I’ve run much larger jobs without seeing this issue.
> 
> This is using the 1.10 series, with the ob1 PML with the openib BTL. If they 
> use Yalla, it works fine, but it would still be good to get it working using 
> the “standard” communication path, without needing the accelerators.
> 
> I was wondering if anyone seen this before, and if anyone had any suggestions 
> for how to proceed?
> 
> Thanks,
> Ben
> 
> _______________________________________________
> users mailing list
> users@lists.open-mpi.org
> https://lists.open-mpi.org/mailman/listinfo/users

Attachment: signature.asc
Description: Message signed with OpenPGP

_______________________________________________
users mailing list
users@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/users

Reply via email to