Re: [OMPI users] HPL: Error occurred in MPI_Recv

2022-06-09 Thread Collin Strassburger via users
Since it is happening on this cluster and not on others, have you checked the InfiniBand counters to ensure it’s not a bad cable or something along those lines? I believe the command is ibdiag (or something similar). Collin From: users On Behalf Of Bart Willems via users Sent: Thursday, June

[OMPI users] HPL: Error occurred in MPI_Recv

2022-06-09 Thread Bart Willems via users
Hello, I am attempting to run High Performance Linpack (2.3) between 2 nodes with Open MPI 4.1.4 and MLNX_OFED_LINUX-5.6-2.0.9.0-rhel8.6-x86_64. Within a minute or so, the run always crashes with [node002:04556] *** An error occurred in MPI_Recv [node002:04556] *** reported by process [1007222785