Hi Devendar,

Thank you for your answer.

Setting MXM_TLS=rc,shm,self does improve the speed of MXM (both latency and 
bandwidth):

 without MXM_TLS

    comm       lat_min      bw_max      bw_max
               pingpong     pingpong    sendrecv
               (us)         (MB/s)      (MB/s)
    -------------------------------------------
    openib     1.79         5827.93    11552.4
    mxm        2.23         5191.77     8201.76
    yalla      2.18         5200.55     8109.48


 with MXM_TLS=rc,shm,self

    comm       lat_min      bw_max      bw_max
               pingpong     pingpong    sendrecv
               (us)         (MB/s)      (MB/s)
    -------------------------------------------
    openib     1.79         6021.83    11529
    mxm        1.78         5936.92    11168.5
    yalla      1.78         5944.86    11375


Note 1: MXM_RDMA_PORTS=mlx4_0:1 and the MCA parameter 
btl_openib_include_if=mlx4_0 for both cases.

Note 2: The bandwidth reported are not very accurate. Bandwidth results can 
vary easilly by 7% from one run to another.

We see that the performance of MXM is now very similar to the performance of 
openib for these IMB tests.

However an error is now reported a few times when MXM_TLS is set:

sys.c:468  MXM  ERROR A new segment was to be created and size < SHMMIN or size 
> SHMMAX, or the new segment was to be created. A segment with given key 
existed, but size is greater than the size of that segment. Please check limits 
by 'ipcs -l'.

"ipcs -l" reports among other things that:

  max seg size (kbytes) = 32768

By the way, is it too small ?


Now if we run /opt/mellanox/mxm/mxm_perftest we get:

                                          without      with
                                          MXM_TLS      MXM_TLS
  ------------------------------------------------------------
  avg send_lat                    (us)    1.626        1.321

  avg send_bw       -s 4000000    (MB/s)  5219.51      5514.04
  avg bidir send_bw -s 4000000 -b (MB/s)  5283.13      5514.45

Note: the -b for bidirectional bandwith doesn't seen to affect the result.

Again it is an improvement both in term of latency and bandwidth.

However a warning is reported when MXM_TLS is set on the server side when the 
send_lat test is run:

 icb_ep.c:287   MXM  WARN  The min value for CIB_RX_QUEUE_LEN is 2048.

Note: setting the undocumented env variable MXM_CIB_RX_QUEUE_LEN=2048 remove 
the warning but doesn't affect the send latency.


 * * *

So now the results are better: MXM performs as well as the regular openib in 
term of latency and bandwidth (I didn't checked the overlap capacity though). 
But I'm not really impressed. I was expecting MXM (especially when used by 
yalla) to be a little better than openib. Also the latency of both openib, mxm 
and yalla at 1.8 us seems to be too high. With a configuration like ours, we 
should get something closer to 1 us.

Does anyone has an idea ?

Don't forget that this cluster uses LXC containers with SR-IOV enabled for the 
Infiniband adapter.

Martin Audet


> Hi Martin,
>
> Can you check if it is any better with  "-x MXM_TLS=rc,shm,self" ?
>
> -Devendar


_______________________________________________
users mailing list
users@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/users

Reply via email to