ob1/openib is RC based which have scalability issues, mxm 1.1 is ud based and kicks in at scale. We observe mxm outperforms ob1 on 8+ nodes.
We will update docs as you mentioned, thanks Regards On Thu, May 10, 2012 at 4:30 PM, Derek Gerstmann <derek.gerstm...@uwa.edu.au > wrote: > On May 9, 2012, at 7:41 PM, Mike Dubman wrote: > > > you need latest OMPI 1.6.x and latest MXM ( > ftp://bgate.mellanox.com/hpc/mxm/v1.1/mxm_1.1.1067.tar) > > Excellent! Thanks for the quick response! Using the MXM v1.1.1067 > against OMPI v1.6.x did the trick. Please (!!!) add a note to the docs for > OMPI 1.6.x to help out other users -- there's zero mention of this anywhere > that I could find from scouring the archives and source code. > > Sadly, performance isn't what we'd expect. OB1 is outperforming CM MXM > (consistently). > > Are there any suggested configuration settings? We tried all the obvious > ones listed in the OMPI Wiki and mailing list archives, but few have had > much of an effect. > > We seem to do better with the OB1 openib btl, than the lower level CM MXM. > Any suggestions? > > Here's numbers from the OSU MicroBenchmarks (for the MBW_MR test) running > on 2x pairs, aka 4 separate hosts, each using Mellanox ConnectX, one card > per host, single port, single switch): > > -- OB1 > > /opt/openmpi/1.6.0/bin/mpiexec -np 4 --mca pml ob1 --mca btl ^tcp --mca > mpi_use_pinned 1 -hostfile all_hosts ./osu-micro-benchmarks/osu_mbw_mr > # OSU MPI Multiple Bandwidth / Message Rate Test v3.6 > # [ pairs: 2 ] [ window size: 64 ] > # Size MB/s Messages/s > 1 2.91 2909711.73 > 2 5.97 2984274.11 > 4 11.70 2924292.78 > 8 23.00 2874502.93 > 16 44.75 2796639.64 > 32 89.49 2796639.64 > 64 175.98 2749658.96 > 128 292.41 2284459.86 > 256 527.84 2061874.61 > 512 961.65 1878221.77 > 1024 1669.06 1629943.87 > 2048 2220.43 1084193.45 > 4096 2906.57 709611.68 > 8192 3017.65 368365.70 > 16384 5225.97 318967.95 > 32768 5418.98 165374.23 > 65536 5998.07 91523.27 > 131072 6031.69 46018.16 > 262144 6063.38 23129.97 > 524288 5971.77 11390.24 > 1048576 5788.75 5520.59 > 2097152 5791.39 2761.55 > 4194304 5820.60 1387.74 > > -- MXM > > /opt/openmpi/1.6.0/bin/mpiexec -np 4 --mca pml cm --mca mtl mxm --mca > btl ^tcp --mca mpi_use_pinned 1 -hostfile all_hosts > ./osu-micro-benchmarks/osu_mbw_mr > # OSU MPI Multiple Bandwidth / Message Rate Test v3.6 > # [ pairs: 2 ] [ window size: 64 ] > # Size MB/s Messages/s > 1 2.07 2074863.43 > 2 4.14 2067830.81 > 4 10.57 2642471.39 > 8 23.16 2895275.37 > 16 38.73 2420627.22 > 32 66.77 2086718.41 > 64 147.87 2310414.05 > 128 284.94 2226109.85 > 256 537.27 2098709.64 > 512 1041.91 2034989.43 > 1024 1930.93 1885676.34 > 2048 1998.68 975916.00 > 4096 2880.72 703299.77 > 8192 3608.45 440484.17 > 16384 4027.15 245797.51 > 32768 4464.85 136256.47 > 65536 4594.22 70102.23 > 131072 4655.62 35519.55 > 262144 4671.56 17820.58 > 524288 4604.16 8781.74 > 1048576 4635.51 4420.77 > 2097152 3575.17 1704.78 > 4194304 2828.19 674.29 > > Thanks! > > -[dg] > > Derek Gerstmann, PhD Student > The University of Western Australia (UWA) > > w: http://local.ivec.uwa.edu.au/~derek > e: derek.gerstmann [at] icrar.org > > On May 9, 2012, at 7:41 PM, Mike Dubman wrote: > > > you need latest OMPI 1.6.x and latest MXM ( > ftp://bgate.mellanox.com/hpc/mxm/v1.1/mxm_1.1.1067.tar) > > > > > > > > On Wed, May 9, 2012 at 6:02 AM, Derek Gerstmann < > derek.gerstm...@uwa.edu.au> wrote: > > What versions of OpenMPI and the Mellanox MXM libraries have been tested > and verified to work? > > > > We are currently trying to build OpenMPI v1.5.5 against the MXM 1.0.601 > (included in the MLNX_OFED_LINUX-1.5.3-3.0.0 distribution) and are getting > build errors. > > > > Specifically, there's a single undefined type (mxm_wait_t) being used in > the OpenMPI tree: > > > > openmpi-1.5.5/ompi/mca/mtl/mxm/mtl_mxm_send.c:44 mxm_wait_t > wait; > > > > There is no mxm_wait_t defined anywhere in the current MXM API > (/opt/mellanox/mxm/include/mxm/api), which suggests a version mismatch. > > > > The OpenMPI v1.6 branch has a note in the readme saying "Minor Fixes for > Mellanox MXM" were added, but the same undefined mxm_wait_t is still being > used. > > > > What versions of OpenMPI and MXM are verified to work? > > > > Thanks! > > > > -[dg] > > > > Derek Gerstmann, PhD Student > > The University of Western Australia (UWA) > > > > w: http://local.ivec.uwa.edu.au/~derek > > e: derek.gerstmann [at] icrar.org > > > > > > > > > > _______________________________________________ > > users mailing list > > us...@open-mpi.org > > http://www.open-mpi.org/mailman/listinfo.cgi/users > > > > _______________________________________________ > > users mailing list > > us...@open-mpi.org > > http://www.open-mpi.org/mailman/listinfo.cgi/users > > > > -[dg] > > Derek Gerstmann, PhD Student > The University of Western Australia (UWA) > > w: http://local.ivec.uwa.edu.au/~derek > e: derek.gerstmann [at] icrar.org > > > > > _______________________________________________ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users >