Thanks Jeff, It worked. Now latency and bandwidth benchmarks are in performing as expected for both Ethernet and InfiniBand.
--Ansar On Wed, Sep 10, 2014 at 3:34 PM, Jeff Squyres (jsquyres) <jsquy...@cisco.com > wrote: > Are you inadvertently using the MXM MTL? That's an alternate Mellanox > transport that may activate itself, even if you've disabled the openib > BTL. Try this: > > mpirun --mca pml ob1 --mca btl ^openib ... > > This forces the use of the ob1 PML (which forces the use of the BTLs, not > the MTLs), and then disables the openib BTL. > > > On Sep 9, 2014, at 10:24 AM, Muhammad Ansar Javed < > muhammad.an...@seecs.edu.pk> wrote: > > > Hi, > > > > I am currently conducting some testing on system with Gigabit and > InfiniBand interconnects. Both Latency and Bandwidth benchmarks are doing > well as expected on InfiniBand interconnects but Ethernet interconnect is > achieving very high performance from expectations. Ethernet and InfiniBand > both are achieving equivalent performance. > > > > For some reason, it looks like openmpi (v1.8.1) is using the InfiniBand > interconnect rather than the Gigabit or TCP communication is being emulated > to use InifiniBand interconnect. > > > > Here are Latency and Bandwidth benchmark results. > > #--------------------------------------------------- > > # Benchmarking PingPong > > # processes = 2 > > # map-by node > > #--------------------------------------------------- > > > > Hello, world. I am 1 on node124 > > Hello, world. I am 0 on node123 > > Size Latency (usec) Bandwidth (Mbps) > > 1 1.65 4.62 > > 2 1.67 9.16 > > 4 1.66 18.43 > > 8 1.66 36.74 > > 16 1.85 66.00 > > 32 1.83 133.28 > > 64 1.83 266.36 > > 128 1.88 519.10 > > 256 1.99 982.29 > > 512 2.23 1752.37 > > 1024 2.58 3026.98 > > 2048 3.32 4710.76 > > > > I read some of the FAQs and noted that OpenMPI prefers the faster > available interconnect. In an effort to force it to use the gigabit > interconnect I ran it as follows > > > > 1. mpirun -np 2 -machinefile machines -map-by node --mca btl tcp --mca > btl_tcp_if_include em1 ./latency.ompi > > 2. mpirun -np 2 -machinefile machines -map-by node --mca btl tcp,self,sm > --mca btl_tcp_if_include em1 ./latency.ompi > > 3. mpirun -np 2 -machinefile machines -map-by node --mca btl ^openib > --mca btl_tcp_if_include em1 ./latency.ompi > > 4. mpirun -np 2 -machinefile machines -map-by node --mca btl ^openib > ./latency.ompi > > > > None of them resulted in a significantly different benchmark output. > > > > I am using OpenMPI by loading module on clustered environment and don't > have admin access. It is configured for both TCP and OpenIB (confirmed from > ompi_info). After trying all above mentioned methods without success I > installed OpenMPI v1.8.2 in my home directory and disable openib with > following configuration options > > > > --disable-openib-control-hdr-padding --disable-openib-dynamic-sl > --disable-openib-connectx-xrc --disable-openib-udcm > --disable-openib-rdmacm --disable-btl-openib-malloc-alignment > --disable-io-romio --without-openib --without-verbs > > > > Now, openib is not enabled (confirmed from ompi_info script) and there > is no "openib.so" file in $prefix/lib/openmpi directory as well. Still, > above mentioned mpirun commands are getting the same latency and bandwidth > as that of InfiniBand. > > > > I tried mpirun in verbose mode with following command and here is the > output > > > > Command: > > mpirun -np 2 -machinefile machines -map-by node --mca btl tcp --mca > btl_base_verbose 30 --mca btl_tcp_if_include em1 ./latency.ompi > > > > Output: > > [node123.prv.sciama.cluster:88310] mca: base: components_register: > registering btl components > > [node123.prv.sciama.cluster:88310] mca: base: components_register: found > loaded component tcp > > [node123.prv.sciama.cluster:88310] mca: base: components_register: > component tcp register function successful > > [node123.prv.sciama.cluster:88310] mca: base: components_open: opening > btl components > > [node123.prv.sciama.cluster:88310] mca: base: components_open: found > loaded component tcp > > [node123.prv.sciama.cluster:88310] mca: base: components_open: component > tcp open function successful > > [node124.prv.sciama.cluster:90465] mca: base: components_register: > registering btl components > > [node124.prv.sciama.cluster:90465] mca: base: components_register: found > loaded component tcp > > [node124.prv.sciama.cluster:90465] mca: base: components_register: > component tcp register function successful > > [node124.prv.sciama.cluster:90465] mca: base: components_open: opening > btl components > > [node124.prv.sciama.cluster:90465] mca: base: components_open: found > loaded component tcp > > [node124.prv.sciama.cluster:90465] mca: base: components_open: component > tcp open function successful > > Hello, world. I am 1 on node124 > > Hello, world. I am 0 on node123 > > Size Latency(usec) Bandwidth(Mbps) > > 1 4.18 1.83 > > 2 3.66 4.17 > > 4 4.08 7.48 > > 8 3.12 19.57 > > 16 3.83 31.84 > > 32 3.40 71.84 > > 64 4.10 118.97 > > 128 3.89 251.19 > > 256 4.22 462.77 > > 512 2.95 1325.71 > > 1024 2.63 2969.49 > > 2048 3.38 4628.29 > > [node123.prv.sciama.cluster:88310] mca: base: close: component tcp closed > > [node123.prv.sciama.cluster:88310] mca: base: close: unloading component > tcp > > [node124.prv.sciama.cluster:90465] mca: base: close: component tcp closed > > [node124.prv.sciama.cluster:90465] mca: base: close: unloading component > tcp > > > > Moreover, same benchmark applications using MPICH are working fine on > Ethernet and achieving expected Latency and Bandwidth. > > > > How can this be fixed? > > > > Thanks for help, > > > > --Ansar > > > > > > _______________________________________________ > > users mailing list > > us...@open-mpi.org > > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users > > Link to this post: > http://www.open-mpi.org/community/lists/users/2014/09/25297.php > > > -- > Jeff Squyres > jsquy...@cisco.com > For corporate legal information go to: > http://www.cisco.com/web/about/doing_business/legal/cri/ > > _______________________________________________ > users mailing list > us...@open-mpi.org > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users > Link to this post: > http://www.open-mpi.org/community/lists/users/2014/09/25307.php >