erences I see between guillimin and colosse are
> >
> > - Open-MPI 1.4.3 (colosse) v. MVAPICH2 1.6 (guillimin)
> > - Mellanox (colosse) v. QLogic (guillimin)
> >
> >
> > Does anyone experienced such a high latency with Open-MPI 1.4.3 on
> Mellanox HCAs ?
>
Ole Nielsen wrote:
Thanks for your suggestion Gus, we need a way of debugging what is going
on. I am pretty sure the problem lies with our cluster configuration. I
know MPI simply relies on the underlying network. However, we can ping
and ssh to all nodes (and in between and pair as well) so it
>> 1: After a reboot of two nodes I ran again, and the inter-node freeze didn't
>happen until the third iteration. I take that to mean that the basic
>communication works, but that something is saturating. Is there some notion
>of buffer size somewhere in the MPI system that could explain this?
>
On Sep 19, 2011, at 10:23 PM, Ole Nielsen wrote:
> Hi all - and sorry for the multiple postings, but I have more information.
+1 on Eugene's comments. The test program looks fine to me.
FWIW, you don't need -lmpi to compile your program; OMPI's wrapper compiler
allows you to just:
mpicc m
Hello Ole
I ran your program on open-mpi-1.4.2 five times, and all five times, it
finished successfully.
So, I think the problem was with the version of mpi.
Output from your program is attached. I ran on 3 nodes:
$home/OpenMPI-1.4.2/bin/mpirun -np 3 -v --output-filename mpi_testfile
./mpi_t
Further to the posting below, I can report that the test program (attached -
this time correctly) is chewing up CPU time on both compute nodes for as
long as I care to let it continue.
It would appear that MPI_Receive which is the next command after the print
statements in the test program.
Has an