Jeff Squyres wrote:
On Oct 18, 2007, at 9:24 AM, Marcin Skoczylas wrote:
PML add procs failed
--> Returned "Unreachable" (-12) instead of "Success" (0)
----------------------------------------------------------------------
----
*** An error occurred in MPI_Init
*** before MPI was initialized
*** MPI_ERRORS_ARE_FATAL (goodbye)
Yoinks -- OMPI is determining that it can't use the TCP BTL to reach
other hosts.
/I assume this could be because of:
$ /sbin/route
Kernel IP routing table
Destination Gateway Genmask Flags Metric Ref
Use
Iface
192.125.17.0 * 255.255.255.0 U 0
0 0 eth1
192.168.12.0 * 255.255.255.0 U 0
0 0 eth1
161.254.0.0 * 255.255.0.0 U 0
0 0 eth1
default 192.125.17.1 0.0.0.0 UG 0
0 0 eth1
192.125 -- is that supposed to be a private address? If so, that's
not really the Right way to do things...
Actually the configuration here is quite strange, this is not a private
address. The head node sits on a public address from 192.125.17.0 net
(routable from outside), workers are on 192.168.12.0
So "narrowly scoped netmasks" which (as it's written in the FAQ)
are not
supported in the OpenMPI. I asked for a workaround on this newsgroup
some time ago - but no answer uptill now. So my question is: what
alternative should I choose that will work in such configuration?
We haven't put in a workaround because (to be blunt) we either forgot
about it and/or not enough people have asked for it. Sorry. :-(
It probably wouldn't be too hard to put in an MCA parameter to say
"don't do netmask comparisons; just assume that every IP address is
reachable by every other IP address."
Would be really great! I hope it's not so complicated to add.
George -- did you mention that you were working on this at one point?
Do you
have some experience in other MPI implementations, for example LamMPI?
LAM/MPI should be able to work just fine in this environment; it
doesn't do any kind of reachability computations like Open MPI does
-- it blindly assumes that every MPI process is reachable by every
other MPI process.
Firstly I'm going to have some discussion with administrators here to do
some more checks... and then I'll try to use LamMPI. The thing is that
I'm not familiar with it at all, I was always using OpenMPI instead.
Hope the configuration is as easy as in the OpenMPI and will work
without root account.
Thank you for your help!
regards, Marcin