On Mon, 2007-12-17 at 17:19 -0500, Jeff Squyres wrote:
> On Dec 17, 2007, at 8:35 AM, Marco Sbrighi wrote:
> 
> > I'm using Open MPI 1.2.2 over OFED 1.2 on an 256 nodes, dual Opteron,
> > dual core, Linux cluster. Of course, with Infiniband 4x interconnect.
> >
> > Each cluster node is equipped with 4 (or more) ethernet interface,
> > namely 2 gigabit ones plus 2 IPoIB. The two gig are named  eth0,eth1,
> > while the two IPoIB are named ib0,ib1.
> >
> > It happens that the eth0 is a management network, with poor
> > performances, and furthermore we wouldn't use the ib* to carry MPI's
> > traffic (neither OOB or TCP), so we would like the eth1 is used for  
> > open
> > MPI OOB and TCP.
> >
> > In order to drive the OOB over only eth1 I've tried various  
> > combinations
> > of oob_tcp_[ex|in]clude MCA statements, starting from the obvious
> >
> > oob_tcp_exclude = lo,eth0,ib0,ib1
> >
> > then trying the othe obvious:
> >
> > oob_tcp_include = eth1
> 
> This one statement (_include) should be sufficient.

I agree with your interpretation, but what I'm experimenting here is "it
should" but in fact it doesn't .....

> 
> Assumedly this(these) statement(s) are in a config file that is being  
> read by Open MPI, such as $HOME/.openmpi/mca-params.conf?

I've tried many combinations: only in $HOME/.openmpi/mca-params.conf,
only in command line and both; but none seems to work correctly.
Nevertheless, what I'm expecting is that if something is specified in 
$HOME/.openmpi/mca-params.conf, then if differently specified in command
line, the last should be assumed, I think.
> 
> > and both at the same time.
> >
> > Next I've tried the following:
> >
> > oob_tcp_exclude = eth0
> >
> > but after the job starts, I still have a lot of tcp connections
> > established using eth0 or ib0 or ib1.
> > Furthermore It happens the following error:
> >
> >   [node191:03976] [0,1,14]-[0,1,12] mca_oob_tcp_peer_complete_connect:
> > connection failed: Connection timed out (110) - retrying
> 
> This is quite odd.  :-(
> 
> > I've found only a way in order to have tcp connections binded only to
> > the eth1 interface, using both the following MCA directives in the
> > command line:
> >
> > mpirun .... --mca oob_tcp_include eth1 --mca oob_tcp_include  
> > lo,eth0,ib0,ib1 .....
> >
> > This sounds me as bug.
> 
> Yes, it does.  Specifying the MCA same param twice on the command line  
> results in undefined behavior -- it will only take one of them, and I  
> assume it'll take the first (but I'd have to check the code to be sure).

OK, I can obtain the same behaviour using only one statement: 
--mca oob_tcp_include eth1,lo,eth0,ib0,ib1

note that using  --mca mpi_show_mca_params what I'm seeing in the report
is the same for both statements (twice and single):

.....
 [node255:30188] oob_tcp_debug=0
[node255:30188] oob_tcp_include=eth1,lo,eth0,ib0,ib1
[node255:30188] oob_tcp_exclude=
.......


> 
> > Is there someone able to reproduce this behaviour?
> > If this is a bug, are there fixes?
> 
> 
> I'm unfortunately unable to reproduce this behavior.  I have a test  
> cluster with 2 IP interfaces: ib0, eth0.  I have tried several  
> combinations of MCA params with 1.2.2:
> 
>     --mca oob_tcp_include ib0
>     --mca oob_tcp_include ib0,bogus
>     --mca oob_tcp_include eth0
>     --mca oob_tcp_include eth0,bogus
>     --mca oob_tcp_exclude ib0
>     --mca oob_tcp_exclude ib0,bogus
>     --mca oob_tcp_exclude eth0
>     --mca oob_tcp_exclude eth0,bogus
> 
> All do as they are supposed to -- including or excluding ib0 or eth0.
> 
> I do note, however, that the handling of these parameters changed in  
> 1.2.3 -- as well as their names.  The names changed to  
> "oob_tcp_if_include" and "oob_tcp_if_exclude" to match other MCA  
> parameter name conventions from other components.
> 
> Could you try with 1.2.3 or 1.2.4 (1.2.4 is the most recent; 1.2.5 is  
> due out "soon" -- it *may* get out before the holiday break, but no  
> promises...)?

we have 1.2.3 in another cluster and it performs the same behaviour as
1.2.2 .... (BTW the other cluster has the same eth ifaces)

> 
> If you can't upgrade, let me know and I can provide a debugging patch  
> that will give us a little more insight into what is happening on your  
> machines.  Thanks.

It is quite difficult for us to upgrade the open-mpi now. We have the
official CISCO packages installed, and I know the 1.2.2-1 is the only
official CISCO's open-mpi distribution today ....

In any case I would like to try your debug patch.

Thanks

Marco 

> 
-- 
-----------------------------------------------------------------
 Marco Sbrighi  m.sbri...@cineca.it

 HPC Group
 CINECA Interuniversity Computing Centre
 via Magnanelli, 6/3
 40033 Casalecchio di Reno (Bo) ITALY
 tel. 051 6171516

Reply via email to