Hi Devendar, Thank you for your answer.
Setting MXM_TLS=rc,shm,self does improve the speed of MXM (both latency and bandwidth): without MXM_TLS comm lat_min bw_max bw_max pingpong pingpong sendrecv (us) (MB/s) (MB/s) ------------------------------------------- openib 1.79 5827.93 11552.4 mxm 2.23 5191.77 8201.76 yalla 2.18 5200.55 8109.48 with MXM_TLS=rc,shm,self comm lat_min bw_max bw_max pingpong pingpong sendrecv (us) (MB/s) (MB/s) ------------------------------------------- openib 1.79 6021.83 11529 mxm 1.78 5936.92 11168.5 yalla 1.78 5944.86 11375 Note 1: MXM_RDMA_PORTS=mlx4_0:1 and the MCA parameter btl_openib_include_if=mlx4_0 for both cases. Note 2: The bandwidth reported are not very accurate. Bandwidth results can vary easilly by 7% from one run to another. We see that the performance of MXM is now very similar to the performance of openib for these IMB tests. However an error is now reported a few times when MXM_TLS is set: sys.c:468 MXM ERROR A new segment was to be created and size < SHMMIN or size > SHMMAX, or the new segment was to be created. A segment with given key existed, but size is greater than the size of that segment. Please check limits by 'ipcs -l'. "ipcs -l" reports among other things that: max seg size (kbytes) = 32768 By the way, is it too small ? Now if we run /opt/mellanox/mxm/mxm_perftest we get: without with MXM_TLS MXM_TLS ------------------------------------------------------------ avg send_lat (us) 1.626 1.321 avg send_bw -s 4000000 (MB/s) 5219.51 5514.04 avg bidir send_bw -s 4000000 -b (MB/s) 5283.13 5514.45 Note: the -b for bidirectional bandwith doesn't seen to affect the result. Again it is an improvement both in term of latency and bandwidth. However a warning is reported when MXM_TLS is set on the server side when the send_lat test is run: icb_ep.c:287 MXM WARN The min value for CIB_RX_QUEUE_LEN is 2048. Note: setting the undocumented env variable MXM_CIB_RX_QUEUE_LEN=2048 remove the warning but doesn't affect the send latency. * * * So now the results are better: MXM performs as well as the regular openib in term of latency and bandwidth (I didn't checked the overlap capacity though). But I'm not really impressed. I was expecting MXM (especially when used by yalla) to be a little better than openib. Also the latency of both openib, mxm and yalla at 1.8 us seems to be too high. With a configuration like ours, we should get something closer to 1 us. Does anyone has an idea ? Don't forget that this cluster uses LXC containers with SR-IOV enabled for the Infiniband adapter. Martin Audet > Hi Martin, > > Can you check if it is any better with "-x MXM_TLS=rc,shm,self" ? > > -Devendar
_______________________________________________ users mailing list users@lists.open-mpi.org https://rfd.newmexicoconsortium.org/mailman/listinfo/users