Look at your ifconfig output and select the Ethernet device (instead of the
IPoIB one). Traditionally the name lack any fanciness, most distributions
using eth0 as a default.

  George.


On Tue, Sep 9, 2014 at 11:24 PM, Muhammad Ansar Javed <
muhammad.an...@seecs.edu.pk> wrote:

> Hi,
>
> I am currently conducting some testing on system with Gigabit and
> InfiniBand interconnects. Both Latency and Bandwidth benchmarks are doing
> well as expected on InfiniBand interconnects but Ethernet interconnect is
> achieving very high performance from expectations. Ethernet and InfiniBand
> both are achieving equivalent performance.
>
> For some reason, it looks like openmpi (v1.8.1) is using the InfiniBand
> interconnect rather than the Gigabit or TCP communication is being emulated
> to use InifiniBand interconnect.
>
> Here are Latency and Bandwidth benchmark results.
> #---------------------------------------------------
> # Benchmarking PingPong
> # processes = 2
> # map-by node
> #---------------------------------------------------
>
> Hello, world.  I am 1 on node124
> Hello, world.  I am 0 on node123
> Size Latency (usec) Bandwidth (Mbps)
> 1    1.65    4.62
> 2    1.67    9.16
> 4    1.66    18.43
> 8    1.66    36.74
> 16    1.85    66.00
> 32    1.83    133.28
> 64    1.83    266.36
> 128    1.88    519.10
> 256    1.99    982.29
> 512    2.23    1752.37
> 1024    2.58    3026.98
> 2048    3.32    4710.76
>
> I read some of the FAQs and noted that OpenMPI prefers the faster
> available interconnect. In an effort to force it to use the gigabit
> interconnect I ran it as follows
>
> 1. mpirun -np 2 -machinefile machines -map-by node --mca btl tcp --mca
> btl_tcp_if_include em1 ./latency.ompi
> 2. mpirun -np 2 -machinefile machines -map-by node --mca btl tcp,self,sm
> --mca btl_tcp_if_include em1 ./latency.ompi
> 3. mpirun -np 2 -machinefile machines -map-by node --mca btl ^openib --mca
> btl_tcp_if_include em1 ./latency.ompi
> 4. mpirun -np 2 -machinefile machines -map-by node --mca btl ^openib
> ./latency.ompi
>
> None of them resulted in a significantly different benchmark output.
>
> I am using OpenMPI by loading module on clustered environment and don't
> have admin access. It is configured for both TCP and OpenIB (confirmed from
> ompi_info). After trying all above mentioned methods without success I
> installed OpenMPI v1.8.2 in my home directory and disable openib with
> following configuration options
>
> --disable-openib-control-hdr-padding --disable-openib-dynamic-sl
> --disable-openib-connectx-xrc --disable-openib-udcm
> --disable-openib-rdmacm  --disable-btl-openib-malloc-alignment
> --disable-io-romio --without-openib --without-verbs
>
> Now, openib is not enabled (confirmed from ompi_info script) and there is
> no "openib.so" file in $prefix/lib/openmpi directory as well. Still, above
> mentioned mpirun commands are getting the same latency and bandwidth as
> that of InfiniBand.
>
> I tried mpirun in verbose mode with following command and here is the
> output
>
> Command:
> mpirun -np 2 -machinefile machines -map-by node --mca btl tcp --mca
> btl_base_verbose 30 --mca btl_tcp_if_include em1 ./latency.ompi
>
> Output:
> [node123.prv.sciama.cluster:88310] mca: base: components_register:
> registering btl components
> [node123.prv.sciama.cluster:88310] mca: base: components_register: found
> loaded component tcp
> [node123.prv.sciama.cluster:88310] mca: base: components_register:
> component tcp register function successful
> [node123.prv.sciama.cluster:88310] mca: base: components_open: opening btl
> components
> [node123.prv.sciama.cluster:88310] mca: base: components_open: found
> loaded component tcp
> [node123.prv.sciama.cluster:88310] mca: base: components_open: component
> tcp open function successful
> [node124.prv.sciama.cluster:90465] mca: base: components_register:
> registering btl components
> [node124.prv.sciama.cluster:90465] mca: base: components_register: found
> loaded component tcp
> [node124.prv.sciama.cluster:90465] mca: base: components_register:
> component tcp register function successful
> [node124.prv.sciama.cluster:90465] mca: base: components_open: opening btl
> components
> [node124.prv.sciama.cluster:90465] mca: base: components_open: found
> loaded component tcp
> [node124.prv.sciama.cluster:90465] mca: base: components_open: component
> tcp open function successful
> Hello, world.  I am 1 on node124
> Hello, world.  I am 0 on node123
> Size Latency(usec) Bandwidth(Mbps)
> 1    4.18    1.83
> 2    3.66    4.17
> 4    4.08    7.48
> 8    3.12    19.57
> 16    3.83    31.84
> 32    3.40    71.84
> 64    4.10    118.97
> 128    3.89    251.19
> 256    4.22    462.77
> 512    2.95    1325.71
> 1024    2.63    2969.49
> 2048    3.38    4628.29
> [node123.prv.sciama.cluster:88310] mca: base: close: component tcp closed
> [node123.prv.sciama.cluster:88310] mca: base: close: unloading component
> tcp
> [node124.prv.sciama.cluster:90465] mca: base: close: component tcp closed
> [node124.prv.sciama.cluster:90465] mca: base: close: unloading component
> tcp
>
> Moreover, same benchmark applications using MPICH are working fine on
> Ethernet and achieving expected Latency and Bandwidth.
>
> How can this be fixed?
>
> Thanks for help,
>
> --Ansar
>
>
>
> _______________________________________________
> users mailing list
> us...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
> Link to this post:
> http://www.open-mpi.org/community/lists/users/2014/09/25297.php
>

Reply via email to