Hi Martin

MXM default transport is UD (MXM_TLS=*ud*,shm,self), which is scalable when
running with large applications.  RC(MXM_TLS=*rc,*shm,self)  is recommended
for microbenchmarks and very small scale applications,

yes, max seg size setting is too small.

Did you check any message rate benchmarks(like osu_mbw_mr) with MXM?

virtualization env will have some overhead.  see some perf comparision here
with mvapich
http://mvapich.cse.ohio-state.edu/performance/v-pt_to_pt/ .





On Fri, Aug 19, 2016 at 3:11 PM, Audet, Martin <martin.au...@cnrc-nrc.gc.ca>
wrote:

> Hi Devendar,
>
> Thank you for your answer.
>
> Setting MXM_TLS=rc,shm,self does improve the speed of MXM (both latency
> and bandwidth):
>
> without MXM_TLS
>
>     comm       lat_min      bw_max      bw_max
>                pingpong     pingpong    sendrecv
>                (us)         (MB/s)      (MB/s)
>     -------------------------------------------
>     openib     1.79         5827.93    11552.4
>     mxm        2.23         5191.77     8201.76
>     yalla      2.18         5200.55     8109.48
>
>
> with MXM_TLS=rc,shm,self
>
>     comm       lat_min      bw_max      bw_max
>                pingpong     pingpong    sendrecv
>                (us)         (MB/s)      (MB/s)
>     -------------------------------------------
>     openib     1.79         6021.83    11529
>     mxm        1.78         5936.92    11168.5
>     yalla      1.78         5944.86    11375
>
>
> Note 1: MXM_RDMA_PORTS=mlx4_0:1 and the MCA parameter
> btl_openib_include_if=mlx4_0 for both cases.
>
> Note 2: The bandwidth reported are not very accurate. Bandwidth results
> can vary easilly by 7% from one run to another.
>
> We see that the performance of MXM is now very similar to the performance
> of openib for these IMB tests.
>
> However an error is now reported a few times when MXM_TLS is set:
>
> sys.c:468  MXM  ERROR A new segment was to be created and size < SHMMIN or
> size > SHMMAX, or the new segment was to be created. A segment with given
> key existed, but size is greater than the size of that segment. Please
> check limits by 'ipcs -l'.
>
> "ipcs -l" reports among other things that:
>
>   max seg size (kbytes) = 32768
>
> By the way, is it too small ?
>
>
> Now if we run /opt/mellanox/mxm/mxm_perftest we get:
>
>                                           without      with
>                                           MXM_TLS      MXM_TLS
>   ------------------------------------------------------------
>   avg send_lat                    (us)    1.626        1.321
>
>   avg send_bw       -s 4000000    (MB/s)  5219.51      5514.04
>   avg bidir send_bw -s 4000000 -b (MB/s)  5283.13      5514.45
>
> Note: the -b for bidirectional bandwith doesn't seen to affect the result.
>
> Again it is an improvement both in term of latency and bandwidth.
>
> However a warning is reported when MXM_TLS is set on the server side when
> the send_lat test is run:
>
> icb_ep.c:287   MXM  WARN  The min value for CIB_RX_QUEUE_LEN is 2048.
>
> Note: setting the undocumented env variable MXM_CIB_RX_QUEUE_LEN=2048
> remove the warning but doesn't affect the send latency.
>
>
> * * *
>
> So now the results are better: MXM performs as well as the regular openib
> in term of latency and bandwidth (I didn't checked the overlap capacity
> though). But I'm not really impressed. I was expecting MXM (especially when
> used by yalla) to be a little better than openib. Also the latency of both
> openib, mxm and yalla at 1.8 us seems to be too high. With a configuration
> like ours, we should get something closer to 1 us.
>
> Does anyone has an idea ?
>
> Don't forget that this cluster uses LXC containers with SR-IOV enabled for
> the Infiniband adapter.
>
> Martin Audet
>
>
> > Hi Martin,
> >
> > Can you check if it is any better with  "-x MXM_TLS=rc,shm,self" ?
> >
> > -Devendar
>
>
>
> _______________________________________________
> users mailing list
> users@lists.open-mpi.org
> https://rfd.newmexicoconsortium.org/mailman/listinfo/users
>



-- 


-Devendar
_______________________________________________
users mailing list
users@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/users

Reply via email to