Hello All,

I have a RoCE interoperability event starting next week and I was wondering
if anyone had any ideas to help me with a new vendor I am trying to help get
ready. 

I am using:

*         Open MPI 2.1

*         Intel MPI Benchmarks 2018

*         OFED 3.18 (requirement from vendor)

*         SLES 11 SP3 (requirement from vendor)

 

The problem seems to be that the device does not handle larger message sizes
well and I am sure they will be working on this but I am hoping there may be
a way to complete an IMB run with some Open MPI parameter tweaking.

Sample of IMB output from a Sendrecv benchmark:

 

262144          160       131.07       132.24       131.80      3964.56

       524288           80       277.42       284.57       281.57
3684.71

      1048576           40       461.16       474.83       470.02
4416.59

      2097152            3      1112.15   4294965.49   2147851.04
0.98

      4194304            2      2815.25   8589929.73   3222731.54
0.98

 

In red text is what looks like the problematic results. This happens on many
of the benchmarks at larger message sizes and causes either a major slowdown
or it causes the job to abort with error:

 

The InfiniBand retry count between two MPI processes has been exceeded.

 

If anyone has any thoughts on how I can complete the benchmarks without the
job aborting I would appreciate it. If anyone has ideas as to why a RoCE
device might show this issue I would take any information on offer. If more
data is required please let me know what is relevant.

 

 

Thank you,

Brendan T. W. Myers

 

 

_______________________________________________
users mailing list
users@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/users

Reply via email to