Re: [OMPI users] [EXTERNAL] OpenMPI 3.1.6 openib failure: "mlx4_0 errno says Success"

2021-10-14 Thread Pritchard Jr., Howard via users
Hi Greg, I think the UCX PML may be discomfited by the lack of thread safety. Could you try using the contrib/configure-release-mt in your ucx folder? You want to add –enable-mt. That’s what stands out in your configure output from the one I usually get when building on a MLNX connectx5 clust

Re: [OMPI users] [EXTERNAL] OpenMPI 3.1.6 openib failure: "mlx4_0 errno says Success"

2021-10-14 Thread Fischer, Greg A. via users
I added -enable-mt and re-installed UCX. Same result. (I didn't re-compile OpenMPI.) A conspicuous warning I see in my UCX configure output is: checking for rdma_establish in -lrdmacm... no configure: WARNING: RDMACM requested but librdmacm is not found or does not provide rdma_establish() API

Re: [OMPI users] [EXTERNAL] OpenMPI 3.1.6 openib failure: "mlx4_0 errno says Success"

2021-10-14 Thread Fischer, Greg A. via users
Thanks, Howard. I downloaded a current version of UCX (1.11.2) and installed it with OpenMPI 4.1.1. When I try to specify the "-mca pml ucx" for a simple, 2-process benchmark problem, I get: -- No components were able to be

Re: [OMPI users] [EXTERNAL] OpenMPI 3.1.6 openib failure: "mlx4_0 errno says Success"

2021-10-14 Thread Pritchard Jr., Howard via users
HI Greg, Oh yes that’s not good about rdmacm. Yes the OFED looks pretty old. Did you by any chance apply that patch? I generated that for a sysadmin here who was in the situation where they needed to maintain Open MPI 3.1.6 but had to also upgrade to some newer RHEL release, but the Open MPi