Hi again,
I managed to reproduce the "bug" with a simple case (see the cpp file
attached).
I am running this on 2 nodes with 8 cores each. If I run with
mpiexec ./test-mpi-latency.out
then the MPI_Ssend operations take about ~1e-5 second for intra-node
ranks, and ~11 seconds for inter-node ra
Hi again,
I found out that if I add an
MPI_Barrier after the MPI_Recv part, then there is no minute-long latency.
Is it possible that even if MPI_Recv returns, the openib btl does not
guarantee that the acknowledgement is sent promptly ? In other words, is
it possible that the computation follo