Hello, I am testing the newly upgraded OFED (5.1-0.6.6) and corresponding OpenMPI (4.0.2, 4.0.4).
I don't know for what reason, but I get a communication error. (There is no error in the combination of OFED(4.6-1.0.1) & OpenMPI(4.0.2)) When communicating between compute nodes(inter-nodes), if the size of send/recv messages exceeds 65535, the following error occurs. This does not happen when using one compute node. If there are any points worth checking, it would be appreciated if you could tell us even a trivial thing. Best Regards, Kihang Part of the error message: [pduru18:351568:0:351568] ib_mlx5_log.c:143 Transport retry count exceeded on mlx5_2:1/RoCE (synd 0x15 vend 0x81 hw_synd 0/0) [pduru18:351568:0:351568] ib_mlx5_log.c:143 RC QP 0x139d4 wqe[0]: RDMA_READ s-- [rva 0x2b9827e90a40 rkey 0x182ab] [va 0x2b270e05ca00 len 219136 lkey 0x3c2b] [pduru18:351565:0:351565] ib_mlx5_log.c:143 Transport retry count exceeded on mlx5_2:1/RoCE (synd 0x15 vend 0x81 hw_synd 0/0) [pduru18:351565:0:351565] ib_mlx5_log.c:143 RC QP 0x139d3 wqe[0]: RDMA_READ s-- [rva 0x2ac9d73be980 rkey 0x8b395] [va 0x2b464c51bc00 len 223232 lkey 0x5e4b] [pduru18:351571:0:351571] ib_mlx5_log.c:143 Transport retry count exceeded on mlx5_2:1/RoCE (synd 0x15 vend 0x81 hw_synd 0/0) [pduru18:351571:0:351571] ib_mlx5_log.c:143 RC QP 0x139d2 wqe[0]: RDMA_READ s-- [rva 0x2b0072dd1980 rkey 0x55fea] [va 0x2b70590d8c00 len 223232 lkey 0x715b] Executable file error message: ==== backtrace (tid: 351571) ==== 0 0x000000000004ed85 ucs_debug_print_backtrace() ???:0 1 0x000000000001f9c2 uct_ib_mlx5_completion_with_err() ???:0 2 0x000000000002e736 uct_rc_mlx5_iface_is_reachable() ???:0 ==== backtrace (tid: 351569) ==== 0 0x000000000004ed85 ucs_debug_print_backtrace() ???:0 1 0x000000000001f9c2 uct_ib_mlx5_completion_with_err() ???:0 2 0x000000000002e736 uct_rc_mlx5_iface_is_reachable() ???:0 3 0x0000000000030481 uct_rc_mlx5_iface_progress() ???:0 4 0x0000000000022f3a ucp_worker_progress() ???:0 5 0x0000000000038574 opal_progress() /export/home/nwp/OFED_TEST/KMALIB/src/openmpi/openmpi-4.0.4/opal/runtime/opal_progress.c:231 6 0x00000000000569f7 ompi_request_wait_completion() /export/home/nwp/OFED_TEST/KMALIB/src/openmpi/openmpi-4.0.4/ompi/../ompi/request/request.h:415 3 0x0000000000030481 uct_rc_mlx5_iface_progress() ???:0 4 0x0000000000022f3a ucp_worker_progress() ???:0 5 0x0000000000038574 opal_progress() /export/home/nwp/OFED_TEST/KMALIB/src/openmpi/openmpi-4.0.4/opal/runtime/opal_progress.c:231 6 0x00000000000569f7 ompi_request_wait_completion() /export/home/nwp/OFED_TEST/KMALIB/src/openmpi/openmpi-4.0.4/ompi/../ompi/request/request.h:415 7 0x00000000000569f7 ompi_request_default_wait() /export/home/nwp/OFED_TEST/KMALIB/src/openmpi/openmpi-4.0.4/ompi/request/req_wait.c:42 8 0x0000000000084772 PMPI_Wait() /export/home/nwp/OFED_TEST/KMALIB/src/openmpi/openmpi-4.0.4/ompi/mpi/c/profile/pwait.c:74 7 0x00000000000569f7 ompi_request_default_wait() /export/home/nwp/OFED_TEST/KMALIB/src/openmpi/openmpi-4.0.4/ompi/request/req_wait.c:42 8 0x0000000000084772 PMPI_Wait() /export/home/nwp/OFED_TEST/KMALIB/src/openmpi/openmpi-4.0.4/ompi/mpi/c/profile/pwait.c:74 9 0x000000000005b26f ompi_wait_f() /export/home/nwp/OFED_TEST/KMALIB/src/openmpi/openmpi-4.0.4/ompi/mpi/fortran/mpif-h/profile/pwait_f.c:76 10 0x00000000005b1642 swap3d_() ???:0 11 0x00000000004a6eb4 hdiff_() ???:0 12 0x000000000046bf81 sciproc_() ???:0 9 0x000000000005b26f ompi_wait_f() /export/home/nwp/OFED_TEST/KMALIB/src/openmpi/openmpi-4.0.4/ompi/mpi/fortran/mpif-h/profile/pwait_f.c:76 10 0x00000000005b1642 swap3d_() ???:0 11 0x00000000004a6eb4 hdiff_() ???:0 12 0x000000000046bf81 sciproc_() ???:0 13 0x0000000000462418 MAIN__() ???:0 14 0x000000000040bfde main() ???:0 15 0x00000000000223d5 __libc_start_main() ???:0 16 0x000000000040bee9 _start() ???:0 Part of the source code: ntotal = 65536 ! 65535 if (recvproc >= 0) then allocate(rbuf(ntotal)) call MPI_Irecv(rbuf,ntotal,MPI_REAL,recvproc,0,MPI_COMM_WORLD, $ irequest,ierror) endif if (sendproc >= 0) then allocate(sbuf(ntotal)) call MPI_Send(sbuf,ntotal,MPI_REAL,sendproc,0,MPI_COMM_WORLD, $ ierror) deallocate(sbuf) endif if (recvproc >= 0) then call MPI_Wait(irequest,istatus,ierror) deallocate(rbuf) endif