Thanks Jeff,
It worked. Now latency and bandwidth benchmarks are in performing as
expected for both Ethernet and InfiniBand.

--Ansar

On Wed, Sep 10, 2014 at 3:34 PM, Jeff Squyres (jsquyres) <jsquy...@cisco.com
> wrote:

> Are you inadvertently using the MXM MTL?  That's an alternate Mellanox
> transport that may activate itself, even if you've disabled the openib
> BTL.  Try this:
>
>   mpirun --mca pml ob1 --mca btl ^openib ...
>
> This forces the use of the ob1 PML (which forces the use of the BTLs, not
> the MTLs), and then disables the openib BTL.
>
>
> On Sep 9, 2014, at 10:24 AM, Muhammad Ansar Javed <
> muhammad.an...@seecs.edu.pk> wrote:
>
> > Hi,
> >
> > I am currently conducting some testing on system with Gigabit and
> InfiniBand interconnects. Both Latency and Bandwidth benchmarks are doing
> well as expected on InfiniBand interconnects but Ethernet interconnect is
> achieving very high performance from expectations. Ethernet and InfiniBand
> both are achieving equivalent performance.
> >
> > For some reason, it looks like openmpi (v1.8.1) is using the InfiniBand
> interconnect rather than the Gigabit or TCP communication is being emulated
> to use InifiniBand interconnect.
> >
> > Here are Latency and Bandwidth benchmark results.
> > #---------------------------------------------------
> > # Benchmarking PingPong
> > # processes = 2
> > # map-by node
> > #---------------------------------------------------
> >
> > Hello, world.  I am 1 on node124
> > Hello, world.  I am 0 on node123
> > Size Latency (usec) Bandwidth (Mbps)
> > 1    1.65    4.62
> > 2    1.67    9.16
> > 4    1.66    18.43
> > 8    1.66    36.74
> > 16    1.85    66.00
> > 32    1.83    133.28
> > 64    1.83    266.36
> > 128    1.88    519.10
> > 256    1.99    982.29
> > 512    2.23    1752.37
> > 1024    2.58    3026.98
> > 2048    3.32    4710.76
> >
> > I read some of the FAQs and noted that OpenMPI prefers the faster
> available interconnect. In an effort to force it to use the gigabit
> interconnect I ran it as follows
> >
> > 1. mpirun -np 2 -machinefile machines -map-by node --mca btl tcp --mca
> btl_tcp_if_include em1 ./latency.ompi
> > 2. mpirun -np 2 -machinefile machines -map-by node --mca btl tcp,self,sm
> --mca btl_tcp_if_include em1 ./latency.ompi
> > 3. mpirun -np 2 -machinefile machines -map-by node --mca btl ^openib
> --mca btl_tcp_if_include em1 ./latency.ompi
> > 4. mpirun -np 2 -machinefile machines -map-by node --mca btl ^openib
> ./latency.ompi
> >
> > None of them resulted in a significantly different benchmark output.
> >
> > I am using OpenMPI by loading module on clustered environment and don't
> have admin access. It is configured for both TCP and OpenIB (confirmed from
> ompi_info). After trying all above mentioned methods without success I
> installed OpenMPI v1.8.2 in my home directory and disable openib with
> following configuration options
> >
> > --disable-openib-control-hdr-padding --disable-openib-dynamic-sl
> --disable-openib-connectx-xrc --disable-openib-udcm
> --disable-openib-rdmacm  --disable-btl-openib-malloc-alignment
> --disable-io-romio --without-openib --without-verbs
> >
> > Now, openib is not enabled (confirmed from ompi_info script) and there
> is no "openib.so" file in $prefix/lib/openmpi directory as well. Still,
> above mentioned mpirun commands are getting the same latency and bandwidth
> as that of InfiniBand.
> >
> > I tried mpirun in verbose mode with following command and here is the
> output
> >
> > Command:
> > mpirun -np 2 -machinefile machines -map-by node --mca btl tcp --mca
> btl_base_verbose 30 --mca btl_tcp_if_include em1 ./latency.ompi
> >
> > Output:
> > [node123.prv.sciama.cluster:88310] mca: base: components_register:
> registering btl components
> > [node123.prv.sciama.cluster:88310] mca: base: components_register: found
> loaded component tcp
> > [node123.prv.sciama.cluster:88310] mca: base: components_register:
> component tcp register function successful
> > [node123.prv.sciama.cluster:88310] mca: base: components_open: opening
> btl components
> > [node123.prv.sciama.cluster:88310] mca: base: components_open: found
> loaded component tcp
> > [node123.prv.sciama.cluster:88310] mca: base: components_open: component
> tcp open function successful
> > [node124.prv.sciama.cluster:90465] mca: base: components_register:
> registering btl components
> > [node124.prv.sciama.cluster:90465] mca: base: components_register: found
> loaded component tcp
> > [node124.prv.sciama.cluster:90465] mca: base: components_register:
> component tcp register function successful
> > [node124.prv.sciama.cluster:90465] mca: base: components_open: opening
> btl components
> > [node124.prv.sciama.cluster:90465] mca: base: components_open: found
> loaded component tcp
> > [node124.prv.sciama.cluster:90465] mca: base: components_open: component
> tcp open function successful
> > Hello, world.  I am 1 on node124
> > Hello, world.  I am 0 on node123
> > Size Latency(usec) Bandwidth(Mbps)
> > 1    4.18    1.83
> > 2    3.66    4.17
> > 4    4.08    7.48
> > 8    3.12    19.57
> > 16    3.83    31.84
> > 32    3.40    71.84
> > 64    4.10    118.97
> > 128    3.89    251.19
> > 256    4.22    462.77
> > 512    2.95    1325.71
> > 1024    2.63    2969.49
> > 2048    3.38    4628.29
> > [node123.prv.sciama.cluster:88310] mca: base: close: component tcp closed
> > [node123.prv.sciama.cluster:88310] mca: base: close: unloading component
> tcp
> > [node124.prv.sciama.cluster:90465] mca: base: close: component tcp closed
> > [node124.prv.sciama.cluster:90465] mca: base: close: unloading component
> tcp
> >
> > Moreover, same benchmark applications using MPICH are working fine on
> Ethernet and achieving expected Latency and Bandwidth.
> >
> > How can this be fixed?
> >
> > Thanks for help,
> >
> > --Ansar
> >
> >
> > _______________________________________________
> > users mailing list
> > us...@open-mpi.org
> > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
> > Link to this post:
> http://www.open-mpi.org/community/lists/users/2014/09/25297.php
>
>
> --
> Jeff Squyres
> jsquy...@cisco.com
> For corporate legal information go to:
> http://www.cisco.com/web/about/doing_business/legal/cri/
>
> _______________________________________________
> users mailing list
> us...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
> Link to this post:
> http://www.open-mpi.org/community/lists/users/2014/09/25307.php
>

Reply via email to