Yes, it is strange. I did similar kind of benchmarks few months back on
another environment and I was able to do achieve expected results on
Ethernet and InfiniBand interconnects. However I am unable to force OpenMPI
to use Ethernet in this particular environment even though openib is not
configured.

I have tried almost all the variants of mpirun scripts that can force
OpenMPI to use Ethernet instead of InfiniBand. Moreover verbose mode shows
that TCP btl module is being used but latency is way better than expected
values for Ethernet.

--
Ansar


On Wed, Sep 10, 2014 at 3:43 AM, George Bosilca <bosi...@icl.utk.edu> wrote:

> This is strange. I have a similar environment with one eth and one ipoib.
> If I manually select the interface I want to use with TCP I get the
> expected results.
>
>
> Here is over IB:
>
> mpirun -np 2 --mca btl tcp,self -host dancer00,dancer01 --mca
> btl_tcp_if_include ib1 ./NPmpi
> 1: dancer01
> 0: dancer00
> Now starting the main loop
>   0:       1 bytes   3093 times -->      0.24 Mbps in      31.39 usec
>   1:       2 bytes   3185 times -->      0.49 Mbps in      31.30 usec
>   2:       3 bytes   3195 times -->      0.73 Mbps in      31.41 usec
>   3:       4 bytes   2122 times -->      0.97 Mbps in      31.39 usec
>
>
> And here the slightly slower eth0:
>
> mpirun -np 2 --mca btl tcp,self -host dancer00,dancer01 --mca
> btl_tcp_if_include eth0 ./NPmpi
> 0: dancer00
> 1: dancer01
> Now starting the main loop
>   0:       1 bytes   1335 times -->      0.13 Mbps in      60.55 usec
>   1:       2 bytes   1651 times -->      0.28 Mbps in      53.62 usec
>   2:       3 bytes   1864 times -->      0.45 Mbps in      51.29 usec
>   3:       4 bytes   1299 times -->      0.61 Mbps in      50.36 usec
>
>
> George.
>
> On Wed, Sep 10, 2014 at 3:40 AM, Muhammad Ansar Javed <
> muhammad.an...@seecs.edu.pk> wrote:
>
>> Thanks George,
>> I am selecting Ethernet device (em1) in mpirun script.
>>
>> Here is ifconfig output:
>> em1       Link encap:Ethernet  HWaddr E0:DB:55:FD:38:46
>>           inet addr:10.30.10.121  Bcast:10.30.255.255  Mask:255.255.0.0
>>           inet6 addr: fe80::e2db:55ff:fefd:3846/64 Scope:Link
>>           UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
>>           RX packets:1537270190 errors:0 dropped:0 overruns:0 frame:0
>>           TX packets:136123598 errors:0 dropped:0 overruns:0 carrier:0
>>           collisions:0 txqueuelen:1000
>>           RX bytes:309333740659 (288.0 GiB)  TX bytes:143480101212 (133.6
>> GiB)
>>           Memory:91820000-91840000
>>
>> Ifconfig uses the ioctl access method to get the full address
>> information, which limits hardware addresses to 8 bytes.
>> Because Infiniband address has 20 bytes, only the first 8 bytes are
>> displayed correctly.
>> Ifconfig is obsolete! For replacement check ip.
>> ib0       Link encap:InfiniBand  HWaddr
>> 80:00:00:03:FE:80:00:00:00:00:00:00:00:00:00:00:00:00:00:00
>>           inet addr:10.32.10.121  Bcast:10.32.255.255  Mask:255.255.0.0
>>           inet6 addr: fe80::211:7500:70:6ab4/64 Scope:Link
>>           UP BROADCAST RUNNING MULTICAST  MTU:2044  Metric:1
>>           RX packets:33621 errors:0 dropped:0 overruns:0 frame:0
>>           TX packets:365 errors:0 dropped:5 overruns:0 carrier:0
>>           collisions:0 txqueuelen:256
>>           RX bytes:1882728 (1.7 MiB)  TX bytes:21920 (21.4 KiB)
>>
>> lo        Link encap:Local Loopback
>>           inet addr:127.0.0.1  Mask:255.0.0.0
>>           inet6 addr: ::1/128 Scope:Host
>>           UP LOOPBACK RUNNING  MTU:16436  Metric:1
>>           RX packets:66889 errors:0 dropped:0 overruns:0 frame:0
>>           TX packets:66889 errors:0 dropped:0 overruns:0 carrier:0
>>           collisions:0 txqueuelen:0
>>           RX bytes:19005445 (18.1 MiB)  TX bytes:19005445 (18.1 MiB)
>>
>>
>>
>>
>>
>>
>>> Date: Wed, 10 Sep 2014 00:06:51 +0900
>>> From: George Bosilca <bosi...@icl.utk.edu>
>>> To: Open MPI Users <us...@open-mpi.org>
>>> Subject: Re: [OMPI users] Forcing OpenMPI to use Ethernet interconnect
>>>         instead of InfiniBand
>>>
>>>
>>> Look at your ifconfig output and select the Ethernet device (instead of
>>> the
>>> IPoIB one). Traditionally the name lack any fanciness, most distributions
>>> using eth0 as a default.
>>>
>>>   George.
>>>
>>>
>>> On Tue, Sep 9, 2014 at 11:24 PM, Muhammad Ansar Javed <
>>> muhammad.an...@seecs.edu.pk> wrote:
>>>
>>> > Hi,
>>> >
>>> > I am currently conducting some testing on system with Gigabit and
>>> > InfiniBand interconnects. Both Latency and Bandwidth benchmarks are
>>> doing
>>> > well as expected on InfiniBand interconnects but Ethernet interconnect
>>> is
>>> > achieving very high performance from expectations. Ethernet and
>>> InfiniBand
>>> > both are achieving equivalent performance.
>>> >
>>> > For some reason, it looks like openmpi (v1.8.1) is using the InfiniBand
>>> > interconnect rather than the Gigabit or TCP communication is being
>>> emulated
>>> > to use InifiniBand interconnect.
>>> >
>>> > Here are Latency and Bandwidth benchmark results.
>>> > #---------------------------------------------------
>>> > # Benchmarking PingPong
>>> > # processes = 2
>>> > # map-by node
>>> > #---------------------------------------------------
>>> >
>>> > Hello, world.  I am 1 on node124
>>> > Hello, world.  I am 0 on node123
>>> > Size Latency (usec) Bandwidth (Mbps)
>>> > 1    1.65    4.62
>>> > 2    1.67    9.16
>>> > 4    1.66    18.43
>>> > 8    1.66    36.74
>>> > 16    1.85    66.00
>>> > 32    1.83    133.28
>>> > 64    1.83    266.36
>>> > 128    1.88    519.10
>>> > 256    1.99    982.29
>>> > 512    2.23    1752.37
>>> > 1024    2.58    3026.98
>>> > 2048    3.32    4710.76
>>> >
>>> > I read some of the FAQs and noted that OpenMPI prefers the faster
>>> > available interconnect. In an effort to force it to use the gigabit
>>> > interconnect I ran it as follows
>>> >
>>> > 1. mpirun -np 2 -machinefile machines -map-by node --mca btl tcp --mca
>>> > btl_tcp_if_include em1 ./latency.ompi
>>> > 2. mpirun -np 2 -machinefile machines -map-by node --mca btl
>>> tcp,self,sm
>>> > --mca btl_tcp_if_include em1 ./latency.ompi
>>> > 3. mpirun -np 2 -machinefile machines -map-by node --mca btl ^openib
>>> --mca
>>> > btl_tcp_if_include em1 ./latency.ompi
>>> > 4. mpirun -np 2 -machinefile machines -map-by node --mca btl ^openib
>>> > ./latency.ompi
>>> >
>>> > None of them resulted in a significantly different benchmark output.
>>> >
>>> > I am using OpenMPI by loading module on clustered environment and don't
>>> > have admin access. It is configured for both TCP and OpenIB (confirmed
>>> from
>>> > ompi_info). After trying all above mentioned methods without success I
>>> > installed OpenMPI v1.8.2 in my home directory and disable openib with
>>> > following configuration options
>>> >
>>> > --disable-openib-control-hdr-padding --disable-openib-dynamic-sl
>>> > --disable-openib-connectx-xrc --disable-openib-udcm
>>> > --disable-openib-rdmacm  --disable-btl-openib-malloc-alignment
>>> > --disable-io-romio --without-openib --without-verbs
>>> >
>>> > Now, openib is not enabled (confirmed from ompi_info script) and there
>>> is
>>> > no "openib.so" file in $prefix/lib/openmpi directory as well. Still,
>>> above
>>> > mentioned mpirun commands are getting the same latency and bandwidth as
>>> > that of InfiniBand.
>>> >
>>> > I tried mpirun in verbose mode with following command and here is the
>>> > output
>>> >
>>> > Command:
>>> > mpirun -np 2 -machinefile machines -map-by node --mca btl tcp --mca
>>> > btl_base_verbose 30 --mca btl_tcp_if_include em1 ./latency.ompi
>>> >
>>> > Output:
>>> > [node123.prv.sciama.cluster:88310] mca: base: components_register:
>>> > registering btl components
>>> > [node123.prv.sciama.cluster:88310] mca: base: components_register:
>>> found
>>> > loaded component tcp
>>> > [node123.prv.sciama.cluster:88310] mca: base: components_register:
>>> > component tcp register function successful
>>> > [node123.prv.sciama.cluster:88310] mca: base: components_open: opening
>>> btl
>>> > components
>>> > [node123.prv.sciama.cluster:88310] mca: base: components_open: found
>>> > loaded component tcp
>>> > [node123.prv.sciama.cluster:88310] mca: base: components_open:
>>> component
>>> > tcp open function successful
>>> > [node124.prv.sciama.cluster:90465] mca: base: components_register:
>>> > registering btl components
>>> > [node124.prv.sciama.cluster:90465] mca: base: components_register:
>>> found
>>> > loaded component tcp
>>> > [node124.prv.sciama.cluster:90465] mca: base: components_register:
>>> > component tcp register function successful
>>> > [node124.prv.sciama.cluster:90465] mca: base: components_open: opening
>>> btl
>>> > components
>>> > [node124.prv.sciama.cluster:90465] mca: base: components_open: found
>>> > loaded component tcp
>>> > [node124.prv.sciama.cluster:90465] mca: base: components_open:
>>> component
>>> > tcp open function successful
>>> > Hello, world.  I am 1 on node124
>>> > Hello, world.  I am 0 on node123
>>> > Size Latency(usec) Bandwidth(Mbps)
>>> > 1    4.18    1.83
>>> > 2    3.66    4.17
>>> > 4    4.08    7.48
>>> > 8    3.12    19.57
>>> > 16    3.83    31.84
>>> > 32    3.40    71.84
>>> > 64    4.10    118.97
>>> > 128    3.89    251.19
>>> > 256    4.22    462.77
>>> > 512    2.95    1325.71
>>> > 1024    2.63    2969.49
>>> > 2048    3.38    4628.29
>>> > [node123.prv.sciama.cluster:88310] mca: base: close: component tcp
>>> closed
>>> > [node123.prv.sciama.cluster:88310] mca: base: close: unloading
>>> component
>>> > tcp
>>> > [node124.prv.sciama.cluster:90465] mca: base: close: component tcp
>>> closed
>>> > [node124.prv.sciama.cluster:90465] mca: base: close: unloading
>>> component
>>> > tcp
>>> >
>>> > Moreover, same benchmark applications using MPICH are working fine on
>>> > Ethernet and achieving expected Latency and Bandwidth.
>>> >
>>> > How can this be fixed?
>>> >
>>> > Thanks for help,
>>> >
>>> > --Ansar
>>>
>>
>>
>>
>>
>> --
>> Regards
>>
>> Ansar Javed
>> HPC Lab
>> SEECS NUST
>> Contact: +92 334 438 9394
>> Skype: ansar.javed.859
>> Email: muhammad.an...@seecs.edu.pk
>>
>> _______________________________________________
>> users mailing list
>> us...@open-mpi.org
>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
>> Link to this post:
>> http://www.open-mpi.org/community/lists/users/2014/09/25299.php
>>
>
>
> _______________________________________________
> users mailing list
> us...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
> Link to this post:
> http://www.open-mpi.org/community/lists/users/2014/09/25300.php
>

Reply via email to