Dear Open MPI developers,
I'm using Open MPI 1.2.2 over OFED 1.2 on an 256 nodes, dual Opteron, dual core, Linux cluster. Of course, with Infiniband 4x interconnect. Each cluster node is equipped with 4 (or more) ethernet interface, namely 2 gigabit ones plus 2 IPoIB. The two gig are named eth0,eth1, while the two IPoIB are named ib0,ib1. It happens that the eth0 is a management network, with poor performances, and furthermore we wouldn't use the ib* to carry MPI's traffic (neither OOB or TCP), so we would like the eth1 is used for open MPI OOB and TCP. In order to drive the OOB over only eth1 I've tried various combinations of oob_tcp_[ex|in]clude MCA statements, starting from the obvious oob_tcp_exclude = lo,eth0,ib0,ib1 then trying the othe obvious: oob_tcp_include = eth1 and both at the same time. Next I've tried the following: oob_tcp_exclude = eth0 but after the job starts, I still have a lot of tcp connections established using eth0 or ib0 or ib1. Furthermore It happens the following error: [node191:03976] [0,1,14]-[0,1,12] mca_oob_tcp_peer_complete_connect: connection failed: Connection timed out (110) - retrying I've found only a way in order to have tcp connections binded only to the eth1 interface, using both the following MCA directives in the command line: mpirun .... --mca oob_tcp_include eth1 --mca oob_tcp_include lo,eth0,ib0,ib1 ..... This sounds me as bug. Is there someone able to reproduce this behaviour? If this is a bug, are there fixes? Thanks. Marco -- ----------------------------------------------------------------- Marco Sbrighi m.sbri...@cineca.it HPC Group CINECA Interuniversity Computing Centre via Magnanelli, 6/3 40033 Casalecchio di Reno (Bo) ITALY tel. 051 6171516