Re: [OMPI users] Exhausting QPs?

2018-03-13 Thread Nathan Hjelm
Yalla works because MXM defaults to using unconnected datagrams (I don’t think it uses RC unless you ask). Is this a fully connected algorithm? I ask because (3584 - 28) * 28 * 3 (default number of QPs/remote process in btl/openib) = 298704 > 262144. This is the problem with RC. Mellanox solved

[OMPI users] Exhausting QPs?

2018-03-13 Thread Ben Menadue
Hi, One of our users is having trouble scaling his code up to 3584 cores (i.e. 128 28-core nodes). It runs fine on 1792 cores (64 nodes), but fails with this at 3584: -- A process failed to create a queue pair. This usually

Re: [OMPI users] How to Build OpenMPI to support FDR over SR-IOV

2018-03-13 Thread Jeff Squyres (jsquyres)
Pharthiphan -- No need to cross-post the same question in three places (GitHub issue, this list, and the devel list). Let's keep the thread on the devel list, where the first parts of your questions have already been answered. Thanks. > On Mar 13, 2018, at 11:30 AM, Pharthiphan Asokan wrote

Re: [OMPI users] openmpi crashes for more than 1 MPI

2018-03-13 Thread Gilles Gouaillardet
Hi, I think it is really time to upgrade Open MPI. Supported versions are 2.1.2 and 3.0.0 Open MPI 1.4 is really old now and I doubt you will ever get any support on that version. Cheers, Gilles On 3/13/2018 3:58 PM, abhisek Mondal wrote: Hi, I'm having a strange issue with Openmpi-

[OMPI users] openmpi crashes for more than 1 MPI

2018-03-13 Thread abhisek Mondal
Hi, I'm having a strange issue with Openmpi-1.4. Whenever I try to run a program with number of mpi more than 1, it crashes. For instance the following code: mpirun -np 2 -bynode `which relion_refine_mpi` --gpu --tau2_fudge 2 --scale --dont_combine_weights_via_disc --iter 25 --norm --psi_ste