[OMPI users] Verbose logging options to track IB communication issues

2022-02-16 Thread Shan-ho Tsai via users
Greetings, We are troubleshooting an IB network fabric issue that is causing some of our MPI applications to failed with errors like this: -- The InfiniBand retry count between two MPI processes has been exceeded. "Retry

Re: [OMPI users] Verbose logging options to track IB communication issues

2022-02-23 Thread Shan-ho Tsai via users
application on half the nodes, then the other half. My hunch is that you will find faulty cables. I can of course be very wrong and it is something that this application triggers. On Wed, 16 Feb 2022 at 19:28, Shan-ho Tsai via users mailto:users@lists.open-mpi.org>> wrote: Greeting