Yalla works because MXM defaults to using unconnected datagrams (I don’t think
it uses RC unless you ask). Is this a fully connected algorithm? I ask because
(3584 - 28) * 28 * 3 (default number of QPs/remote process in btl/openib) =
298704 > 262144. This is the problem with RC. Mellanox solved
Hi,
One of our users is having trouble scaling his code up to 3584 cores (i.e. 128
28-core nodes). It runs fine on 1792 cores (64 nodes), but fails with this at
3584:
--
A process failed to create a queue pair. This usually
Pharthiphan --
No need to cross-post the same question in three places (GitHub issue, this
list, and the devel list).
Let's keep the thread on the devel list, where the first parts of your
questions have already been answered.
Thanks.
> On Mar 13, 2018, at 11:30 AM, Pharthiphan Asokan wrote
Hi,
I think it is really time to upgrade Open MPI.
Supported versions are 2.1.2 and 3.0.0
Open MPI 1.4 is really old now and I doubt you will ever get any support
on that version.
Cheers,
Gilles
On 3/13/2018 3:58 PM, abhisek Mondal wrote:
Hi,
I'm having a strange issue with Openmpi-
Hi,
I'm having a strange issue with Openmpi-1.4.
Whenever I try to run a program with number of mpi more than 1, it crashes.
For instance the following code:
mpirun -np 2 -bynode `which relion_refine_mpi` --gpu --tau2_fudge 2
--scale --dont_combine_weights_via_disc --iter 25 --norm --psi_ste