Re: [OMPI users] Eager RDMA causing slow osu_bibw with 3.0.0

2018-04-05 Thread Ben Menadue
Hi Nathan, Howard, Thanks for the feedback. Yes, we do already have UCX compiled in to our OpenMPI installations, but it’s disabled by default on our system because some users were reporting problems with it previously. But I’m not sure what the status of these are with OpenMPI 3.0, something f

Re: [OMPI users] Eager RDMA causing slow osu_bibw with 3.0.0

2018-04-05 Thread Nathan Hjelm
Honestly, this is a configuration issue with the openib btl. There is no reason to keep either eager RDMA nor is there a reason to pipeline RDMA. I haven't found an app where either of these "features" helps you with infiniband. You have the right idea with the parameter changes but Howard is

Re: [OMPI users] Eager RDMA causing slow osu_bibw with 3.0.0

2018-04-05 Thread Howard Pritchard
Hello Ben, Thanks for the info. You would probably be better off installing UCX on your cluster and rebuilding your Open MPI with the --with-ucx configure option. Here's what I'm seeing with Open MPI 3.0.1 on a ConnectX5 based cluster using ob1/openib BTL: mpirun -map-by ppr:1:node -np 2 ./osu

Re: [OMPI users] Eager RDMA causing slow osu_bibw with 3.0.0

2018-04-05 Thread Ben Menadue
Hi, Another interesting point. I noticed that the last two message sizes tested (2MB and 4MB) are lower than expected for both osu_bw and osu_bibw. Increasing the minimum size to use the RDMA pipeline to above these sizes brings those two data-points up to scratch for both benchmarks: 3.0.0, o

[OMPI users] Eager RDMA causing slow osu_bibw with 3.0.0

2018-04-05 Thread Ben Menadue
Hi, We’ve just been running some OSU benchmarks with OpenMPI 3.0.0 and noticed that osu_bibw gives nowhere near the bandwidth I’d expect (this is on FDR IB). However, osu_bw is fine. If I disable eager RDMA, then osu_bibw gives the expected numbers. Similarly, if I increase the number of eager