these flags available in master and v1.10 branches and make sure that ranks to core allocation is done starting from cpu socket closer to the HCA.
Of course you can have same effect with taskset. On Mon, Oct 5, 2015 at 8:46 PM, Dave Love <d.l...@liverpool.ac.uk> wrote: > Mike Dubman <mi...@dev.mellanox.co.il> writes: > > > what is your command line and setup? (ofed version, distro) > > > > This is what was just measured w/ fdr on haswell with v1.8.8 and mxm and > UD > > > > + mpirun -np 2 -bind-to core -display-map -mca rmaps_base_mapping_policy > > dist:span -x MXM_RDMA_PORTS=mlx5_3:1 -mca rmaps_dist_device mlx5_3:1 -x > > MXM_TLS=self,shm,ud osu_latency > > Revisiting this, I'm confused, because rmaps_dist_device isn't in my > build and I don't know what it is. (I tried the binary hpcx stuff, but > it failed to run -- I've forgotten how -- and the build instructions for > ompi under it correspond to what I've used anyway.) The obvious > difference between the above and what I have is mlx5 v. mlx4; is that > likely to account for it? > > _______________________________________________ > users mailing list > us...@open-mpi.org > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users > Link to this post: > http://www.open-mpi.org/community/lists/users/2015/10/27801.php > -- Kind Regards, M.