I continue to have a problem where 2 processes are sending to the same process and one of the sending processes hangs for 150 to 550 ms in the call to MPI_Send.
Each process runs on a different node and the receiving process has posted an MPI_Irecv 17 ms before the hanging send. The posted receives are for 172K buffers and the sending processes are sending 81K size messages. I have set mpi_leave_pinned to 1 and have increased the btl_openib_receive_queues to ...:S,65536,512,256,64 How do I trace the various phases of message passing to diagnose where the send is hanging up?
ompi-output.tar.bz2
Description: ompi-output.tar.bz2