Hi Sean,
Thanks for the report! I have a few questions/suggestions:
1) What version of Open MPI are you using?
2) What is your network? It sounds like you are on an IB cluster using
btl/openib (which is essentially discontinued). Can you try the Open MPI
4.0.4 release with UCX instead of openib (configure with --without-verbs
and --with-ucx)?
3) If that does not help, can you boil your code down to a minimum
working example? That would make it easier for people to try to
reproduce what happens.
Cheers
Joseph
On 7/24/20 11:34 PM, Lewis,Sean via users wrote:
Hi all,
I am encountering a silent hang involving MPI_Ssend and MPI_Irecv. The
subroutine in question is called by each processor and is structured
similar to the pseudo code below. The subroutine is successfully called
several thousand times before the silent hang behavior manifests and
never resolves. The hang will occur in nearly (but not exactly) the same
spot for bit-wise identical tests. During the hang, all MPI ranks will
be at the Line 18 Barrier except for two. One will be waiting at Line
17, waiting for its Irecv to complete, and the other at one of the Ssend
Line 9 or 14. This suggests that a MPI_Irecv never completes and a
processor is indefinitely blocked in the Ssend unable to complete the
transfer.
I’ve found similar discussion of this kind of behavior on the OpenMPI
mailing list:
https://www.mail-archive.com/users@lists.open-mpi.org/msg19227.html
ultimately resolving in setting the mca parameter btl_openib_flags to
304 or 305 (default 310):
https://www.mail-archive.com/users@lists.open-mpi.org/msg19277.html. I
have seen some promising behavior by doing the same. As the mailer
suggests, this implies a problem with the RDMA protocols in infiniband
for large messages.
I wanted to breathe life back into this conversation as the silent hang
issue is particularly debilitating and confusing to me.
Increasing/decreasing the number of processors used does not seem to
alleviate the issue, using MPI_Send results in the same behavior,
perhaps a message has exceeded a memory limit? I am running a test now
that reports the individual message sizes but I previously implemented a
switch to check for buffer size discrepancies which is not triggered. In
the meantime, has anyone run into similar issues or have thoughts as to
remedies for this behavior?
1: call MPI_BARRIER(…)
2: do i = 1,nprocs
3: if(commatrix_recv(i) .gt. 0) then ! Identify which procs to
receive from via predefined matrix
4: call Mpi_Irecv(…)
5: endif
6: enddo
7: do j = mype+1,nproc
8: if(commatrix_send(j) .gt. 0) then ! Identify which procs to
send to via predefined matrix
9: MPI_Ssend(…)
10: endif
11: enddo
12: do j = 1,mype
13: if(commatrix_send(j) .gt. 0) then ! Identify which procs to
send to via predefined matrix
14: MPI_Ssend(…)
15: endif
16: enddo
17: call MPI_Waitall(…) ! Wait for all Irecv to complete
18: call MPI_Barrier(…)
Cluster information:
30 processors
Managed by slurm
OS: Red Hat v. 7.7
Thank you for help/advice you can provide,
Sean
*Sean C. Lewis*
Doctoral Candidate
Department of Physics
Drexel University