Re: [OMPI users] Silent hangs with MPI_Ssend and MPI_Irecv

2020-07-25 Thread Lewis,Sean via users
Oops, I knew I forgot something! I am using OpenMPI 3.1.1 I have tried loading in a OpenMPI 4.0.3 module but receive a repeating error at runtime: [tcn560.bullx:16698] pml_ucx.c:175 Error: Failed to receive UCX worker address: Not found (-13) [tcn560.bullx:16698] [[42671,6],29] ORTE_ERROR_LOG:

Re: [OMPI users] Silent hangs with MPI_Ssend and MPI_Irecv

2020-07-25 Thread Gilles Gouaillardet via users
Sean, you might also want to confirm openib is (part of) the issue by running your app on TCP only. mpirun --mca pml ob1 --mca btl tcp,self, ... Cheers, Gilles - Original Message - > Hi Sean, > > Thanks for the report! I have a few questions/suggestions: > > 1) What version of Open

Re: [OMPI users] Silent hangs with MPI_Ssend and MPI_Irecv

2020-07-25 Thread Joseph Schuchart via users
Hi Sean, Thanks for the report! I have a few questions/suggestions: 1) What version of Open MPI are you using? 2) What is your network? It sounds like you are on an IB cluster using btl/openib (which is essentially discontinued). Can you try the Open MPI 4.0.4 release with UCX instead of openi