Re: [OMPI users] worse latency in 1.8 c.f. 1.6

2015-10-07 Thread Dave Love
Mike Dubman writes: > these flags available in master and v1.10 branches and make sure that ranks > to core allocation is done starting from cpu socket closer to the HCA. I'm confused by the 1.8.8 below, then. I haven't tried 1.10 since it breaks binary compatibility and seemed to have core bin

Re: [OMPI users] worse latency in 1.8 c.f. 1.6

2015-10-06 Thread Mike Dubman
these flags available in master and v1.10 branches and make sure that ranks to core allocation is done starting from cpu socket closer to the HCA. Of course you can have same effect with taskset. On Mon, Oct 5, 2015 at 8:46 PM, Dave Love wrote: > Mike Dubman writes: > > > what is your command

Re: [OMPI users] worse latency in 1.8 c.f. 1.6

2015-10-05 Thread Dave Love
Mike Dubman writes: > what is your command line and setup? (ofed version, distro) > > This is what was just measured w/ fdr on haswell with v1.8.8 and mxm and UD > > + mpirun -np 2 -bind-to core -display-map -mca rmaps_base_mapping_policy > dist:span -x MXM_RDMA_PORTS=mlx5_3:1 -mca rmaps_dist_dev

Re: [OMPI users] worse latency in 1.8 c.f. 1.6

2015-09-30 Thread Dave Love
I wrote: > I'll try some variations like that when I can get complete nodes on the > chassis. It turns out that adding --mca mtl ^mxm to the 1.8 case gives results in line with 1.6 as best as I can estimate the variation (error bars -- we've heard of them). It makes no difference to 1.6 whether

Re: [OMPI users] worse latency in 1.8 c.f. 1.6

2015-09-30 Thread Dave Love
Mike Dubman writes: > what is your command line and setup? (ofed version, distro) It's on up-to-date SL6 (so using whatever RHEL6 ships) running the commands below for the 1.6 and 1.8 cases respectively. The HCA is reported as mlx4_0. Core binding is configured for 1.6. I think they both had

Re: [OMPI users] worse latency in 1.8 c.f. 1.6

2015-09-29 Thread Mike Dubman
what is your command line and setup? (ofed version, distro) This is what was just measured w/ fdr on haswell with v1.8.8 and mxm and UD + mpirun -np 2 -bind-to core -display-map -mca rmaps_base_mapping_policy dist:span -x MXM_RDMA_PORTS=mlx5_3:1 -mca rmaps_dist_device mlx5_3:1 -x MXM_TLS=self,sh

[OMPI users] worse latency in 1.8 c.f. 1.6

2015-09-29 Thread Dave Love
I've just compared IB p2p latency between version 1.6.5 and 1.8.8. I'm surprised to find that 1.8 is rather worse, as below. Assuming that's not expected, are there any suggestions for debugging it? This is with FDR Mellanox, between two Sandybridge nodes on the same blade chassis switch. The r