Mike Dubman <mi...@dev.mellanox.co.il> writes: > what is your command line and setup? (ofed version, distro)
It's on up-to-date SL6 (so using whatever RHEL6 ships) running the commands below for the 1.6 and 1.8 cases respectively. The HCA is reported as mlx4_0. Core binding is configured for 1.6. I think they both had mxm available. mpirun -np 2 --loadbalance ./osu_latency-16 mpirun -np 2 --map-by node ./osu_latency-18 > This is what was just measured w/ fdr on haswell with v1.8.8 and mxm and UD > > + mpirun -np 2 -bind-to core -display-map -mca rmaps_base_mapping_policy > dist:span -x MXM_RDMA_PORTS=mlx5_3:1 -mca rmaps_dist_device mlx5_3:1 -x > MXM_TLS=self,shm,ud osu_latency Thanks. However, I don't know what all that and the other version is about -- I can't keep up with the continual changes in MCA stuff that one apparently has to know -- but it bothers me if I don't get reasonable results from the simplest micro-benchmark with default parameters. I'll try some variations like that when I can get complete nodes on the chassis. > Data for JOB [65499,1] offset 0 > > ======================== JOB MAP ======================== > > Data for node: clx-orion-001 Num slots: 28 Max slots: 0 Num procs: 1 > Process OMPI jobid: [65499,1] App: 0 Process rank: 0 > > Data for node: clx-orion-002 Num slots: 28 Max slots: 0 Num procs: 1 > Process OMPI jobid: [65499,1] App: 0 Process rank: 1 > > ============================================================= > # OSU MPI Latency Test v4.4.1 > # Size Latency (us) > 0 1.18 > 1 1.16 > 2 1.19 > 4 1.20 > 8 1.19 > 16 1.19 > 32 1.21 > 64 1.27 > > > and w/ ob1, openib btl: > > mpirun -np 2 -bind-to core -display-map -mca rmaps_base_mapping_policy > dist:span -mca rmaps_dist_device mlx5_3:1 -mca btl_if_include mlx5_3:1 > -mca pml ob1 -mca btl openib,self osu_latency > > # OSU MPI Latency Test v4.4.1 > # Size Latency (us) > 0 1.13 > 1 1.17 > 2 1.17 > 4 1.17 > 8 1.22 > 16 1.23 > 32 1.25 > 64 1.28