That is indeed the expected behavior, and your solution is the correct
one.
The orted has no way of knowing which interface mpirun can be reached
on, so it has no choice but to work its way through the available
ones. Because of the ordering in the way the OS reports the
interfaces, it is picking up the public one first - so that is the
first one used.
Telling it the right one to use is the only solution.
On Nov 12, 2009, at 7:35 PM, Aaron Knister wrote:
Dear List,
I'm having a really weird issue with openmpi - version 1.3.3
(version 1.2.8 doesn't seem to exhibit this behavior). Essentially
when I start jobs from the cluster front-end node using mpirun,
mpirun sits idle for up to a minute and a half (for 30 nodes) before
running the command I've given it. Running the same command on any
other node in the cluster returns in a fraction of a second. Upon
further research it appears its an issue with the way orted on the
compute nodes are attempting to talk back to the front-end node.
When I launch mpirun from the front-end node this is the process it
spawns on the compute node (public ip scrambled for security
purposes)-
orted --daemonize -mca ess env -mca orte_ess_jobid 1816657920 -mca
orte_ess_vpid 1 -mca orte_ess_num_procs 3 --hnp-uri
1816657920.0;tcp://130.X.X.X:56866;tcp://172.40.10.1:56866;tcp://
172.20.10.1:56866
Throwing in some firewall debugging rules indicate that the compute
nodes were trying to talk back to mpirun on the front-end node over
the front-end node's public ip. Based on this, and looking at the
arguments passed above it seemed as though the public ip of the
front end node was being tried before any its private IPs, and the
delay I was seeing was orted waiting for the connection to the front-
end node's public ip to timeout before it tried it's cluster-facing
ip and the connection succeeded.
I was able to work around this by specifying "--mca
oob_tcp_if_include bond0,eth0" to mpirun (the front-end node has 2
bonded nics as its cluster facing interface). When I provided that
argument the previously experienced delay disappeared. I could
easily put that into openmpi-mca-params.conf and be done with the
problem but I would like to know why openmpi chose to use the public
ip of the node before it's internal IP and if this is expected
behavior. I suspect that it may not be.
-Aaron
_______________________________________________
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users