Hi Martin MXM default transport is UD (MXM_TLS=*ud*,shm,self), which is scalable when running with large applications. RC(MXM_TLS=*rc,*shm,self) is recommended for microbenchmarks and very small scale applications,
yes, max seg size setting is too small. Did you check any message rate benchmarks(like osu_mbw_mr) with MXM? virtualization env will have some overhead. see some perf comparision here with mvapich http://mvapich.cse.ohio-state.edu/performance/v-pt_to_pt/ . On Fri, Aug 19, 2016 at 3:11 PM, Audet, Martin <martin.au...@cnrc-nrc.gc.ca> wrote: > Hi Devendar, > > Thank you for your answer. > > Setting MXM_TLS=rc,shm,self does improve the speed of MXM (both latency > and bandwidth): > > without MXM_TLS > > comm lat_min bw_max bw_max > pingpong pingpong sendrecv > (us) (MB/s) (MB/s) > ------------------------------------------- > openib 1.79 5827.93 11552.4 > mxm 2.23 5191.77 8201.76 > yalla 2.18 5200.55 8109.48 > > > with MXM_TLS=rc,shm,self > > comm lat_min bw_max bw_max > pingpong pingpong sendrecv > (us) (MB/s) (MB/s) > ------------------------------------------- > openib 1.79 6021.83 11529 > mxm 1.78 5936.92 11168.5 > yalla 1.78 5944.86 11375 > > > Note 1: MXM_RDMA_PORTS=mlx4_0:1 and the MCA parameter > btl_openib_include_if=mlx4_0 for both cases. > > Note 2: The bandwidth reported are not very accurate. Bandwidth results > can vary easilly by 7% from one run to another. > > We see that the performance of MXM is now very similar to the performance > of openib for these IMB tests. > > However an error is now reported a few times when MXM_TLS is set: > > sys.c:468 MXM ERROR A new segment was to be created and size < SHMMIN or > size > SHMMAX, or the new segment was to be created. A segment with given > key existed, but size is greater than the size of that segment. Please > check limits by 'ipcs -l'. > > "ipcs -l" reports among other things that: > > max seg size (kbytes) = 32768 > > By the way, is it too small ? > > > Now if we run /opt/mellanox/mxm/mxm_perftest we get: > > without with > MXM_TLS MXM_TLS > ------------------------------------------------------------ > avg send_lat (us) 1.626 1.321 > > avg send_bw -s 4000000 (MB/s) 5219.51 5514.04 > avg bidir send_bw -s 4000000 -b (MB/s) 5283.13 5514.45 > > Note: the -b for bidirectional bandwith doesn't seen to affect the result. > > Again it is an improvement both in term of latency and bandwidth. > > However a warning is reported when MXM_TLS is set on the server side when > the send_lat test is run: > > icb_ep.c:287 MXM WARN The min value for CIB_RX_QUEUE_LEN is 2048. > > Note: setting the undocumented env variable MXM_CIB_RX_QUEUE_LEN=2048 > remove the warning but doesn't affect the send latency. > > > * * * > > So now the results are better: MXM performs as well as the regular openib > in term of latency and bandwidth (I didn't checked the overlap capacity > though). But I'm not really impressed. I was expecting MXM (especially when > used by yalla) to be a little better than openib. Also the latency of both > openib, mxm and yalla at 1.8 us seems to be too high. With a configuration > like ours, we should get something closer to 1 us. > > Does anyone has an idea ? > > Don't forget that this cluster uses LXC containers with SR-IOV enabled for > the Infiniband adapter. > > Martin Audet > > > > Hi Martin, > > > > Can you check if it is any better with "-x MXM_TLS=rc,shm,self" ? > > > > -Devendar > > > > _______________________________________________ > users mailing list > users@lists.open-mpi.org > https://rfd.newmexicoconsortium.org/mailman/listinfo/users > -- -Devendar
_______________________________________________ users mailing list users@lists.open-mpi.org https://rfd.newmexicoconsortium.org/mailman/listinfo/users