Hi Andreas:
You can either exclude eth0 or include eth1 on the OpenMPI
byte transport layer.
To do that you need to insert these flags on your mpiexec command line:
-mca btl tcp,sm,self
-mca btl_tcp_if_exclude lo,eth0
or
-mca btl tcp,sm,self
-mca btl_tcp_if_include eth1
See this FAQ for more info:
http://www.open-mpi.org/faq/?category=tcp#tcp-selection
(BTW, the OpenMPI FAQs are a great resource!)
You can use the default hosts file (10.42.0.21, 10.42.0.22).
At least it works fine this way for me here,
and diverts all the MPI traffic to the eth1 subnet.
Changing the hosts/machines file would be needed in MPICH2,
not in OpenMPI, as far as I know.
(Here we also use the eth0 network for login, control, and I/O,
which I suppose is what you want to do.
We run both OpenMPI and MPICH2.)
Of course your 10.0.1.0 network should be working correctly (and
separate from the 10.42.0.0 net).
You can check this out with the tools (ping, etc).
I hope this helps,
Gus Correa
---------------------------------------------------------------------
Gustavo Correa
Lamont-Doherty Earth Observatory - Columbia University
Palisades, NY, 10964-8000 - USA
---------------------------------------------------------------------
Andreas Hoelzlwimmer wrote:
Hello,
I’m using Open-MPI on a small Cluster of RHEL5.3-Nodes, current
MPI-Version. For me it is a requirement now to run MPI over a certain
adapter, in the current case the “eth1”-interface of my system. The
adapter I want to use MPI for is not the default-adapter (eth0) all the
rest of the traffic has to go over, but I cannot make MPI use the other
adapter and therefore a different IP-Address.
The exact problem, showed on 2 Nodes:
Node 1:
eth0: 10.42.0.21
eth1: 10.0.1.21
Node 2:
eth0: 10.42.0.22
eth1: 10.0.1.22
for testing purposes, I linked the eth1 adapters of both machines
together directly and access the machines remotely via eth0. If I now
try to run an MPI-Program (in this case the MPI-Benchmark HPL) with a
hosts file that specifies 10.0.1.21 and 10.0.1.22 as hosts, it gets
quite problematic. The “netstat –a” command shows me that it uses the
addresses 10.42.* for the connection, the --debug-demon flag tells me
that MPI initializes both nodes, but after that it runs forever and does
not terminate. In addition to that, apart from initial traffic of a
couple of packets, it does not send any network traffic over either of
the network adapters.
Please tell me if any of you have encounter such a problem or setup and
can tell me how to fix it. I tried modifying routing tables, play around
with subnetting, but I wasn’t able to get a successful connection. If
you need more information on that, please tell me. Please note that I’m
quite new to Open-MPI, so it might possibly be something about Open-MPI
I just haven’t discovered yet.
Best regards,
Andreas Hoelzlwimmer
_______________________________________________
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users