Message: 2
List-Post: users@lists.open-mpi.org
Date: Mon, 13 Mar 2006 08:42:59 -0500
From: Brian Barrett <brbar...@open-mpi.org>
Subject: Re: [OMPI users] Using Multiple Gigabit Ethernet Interface
To: Open MPI Users <us...@open-mpi.org>
Message-ID: <8f91ac34-6393-4173-84ef-5e2ac59be...@open-mpi.org>
Content-Type: text/plain; charset=US-ASCII; delsp=yes; format=flowed
On Mar 11, 2006, at 1:00 PM, Jayabrata Chakrabarty wrote:
Hi I have been looking for information on how to use multiple
Gigabit Ethernet Interface for MPI communication.
So far what i have found out is i have to use mca_btl_tcp.
But what i wish to know, is what IP Address to assign to each
Network Interface. I also wish to know if there will be any change
in the format of "hostfile"
I have two Gigabit Ethernet Interface on a cluster of 5 nodes at
present.
Open MPI will use all available (and active) ethernet devices for MPI
communication by default. It does a relatively simplistic netmask
comparison to prefer connections in the same subnet (so if host A has
addresses 192.168.1.1/24 and 192.168.2.1/24 and host B has addresses
192.168.1.2/24 and 192.168.2.2/24, OMPI will make a connection
between the two 192.168.1 addresses and another between the two
192.168.2 addresses). If you have two separate switches for your two
networks (which I would think would give best performance), make sure
that the two have IP address ranges that are in different subnet mask
ranges. Other than that, Open MPI will do the rest.
In Open MPI, the hostfile is completely independent of the MPI
communication network names, so no change is needed there.
I believe (but I could be wrong) that there was an issue with
multiple TCP networks in 1.0.1. I believe this might have been
resolved in our upcoming 1.0.2 release. You may want to try one of
the 1.0.2 pre-releases if you run into trouble with the 1.0.1 release.
Hope this helps,
Brian
-- Brian Barrett Open MPI developer http://www.open-mpi.org/
------------------------------ Dear Brian, I have the same setup as Mr.
Chakrbarty with 16 nodes, Oscar 4.2.1 beta 4 and two Gigabit ethernet
cards with two 16 and 24 port switches one smart and the other managed.
I use dhcp to get the IP addresses for one eth card(The Ip addresses of
these range from 192.168.1.1 ... 16) and use static IP addresses for the
other NIC of 192.168.5.1 ... 16. The MTU of the first is 9000 for both
the nICs and switch. For the second the MTU is 1500 for both the switch
and the NIC cards as the switch cannot go beyond an MTU of beyond 1500.
Using the -mca btl tcp switch with the 192.168.1.1 .. 16 NICs included
and the 192.168.5.1 ... 16 excluded by switches -mca btl_tcp_if_include
eth1(MTU=9000) and -mca btl_tcp_if_exclude eth0 (MTU=1500) I get with
HPL a performance of approx 28.3GigaFlops with both Open Mpi and Mpich2.
But since as you say above if you include both gigabit cards with the
switch -mca btl_tcp_if_include eth0,eth1 using Open Mpi 1.1 (beta) or
1.01 teh performance should increase for the same N and NB in HPL I get
aslight performance decrease instead of increase of about 0.5 to 1
gigaflop less. The hostfile is simply a1, a2 ... a16 using Oscar's DNS
to resolve the domain name. Why is there a performance decrease?
Regards, Allan Menezes