Sorry for the delay in replying -- I thought I had replied to this
already, but I guess I hadn't. :-(
We've talked about this feature several times, but this specific
functionality hasn't made it into the OMPI code base yet. Sorry! :-(
(patches would be gladly accepted, but note that we'll likely be kinda
picky about this code because it's a little hairy and complex...)
On Sep 19, 2008, at 7:00 PM, Jeroen Kleijer wrote:
Hi,
I'm trying to get an openmpi application running accross different
nodes but seem to have hit a snag when the processes are on different
nodes, especially when the machines are on different TCP subnets.
The orted daemons start up fine but after that application borks with
the message
[0,1,2][btl_tcp_endpoint.c:572:mca_btl_tcp_endpoint_complete_connect]
connect() failed with errno=111
I've read in this thread
http://thread.gmane.org/gmane.comp.clustering.open-mpi.user/3427/focus=3437
that openmpi currently can't do this yet but (pre-release?) versions
of openmpi 1.3 will work.
I've tried compiling openmpi 1.3a (nightly build) and running my
program with that (compiled with the mpicc of openmpi 1.3a) but I got
the same error message.
Can anybody confirm that:
1) openmpi has dificulties using the tcp btl accross different subnets
2) there are currently no workarounds for this.
If there are solutions to this I'd really like to know about it as
I've been trying this for quite a while now.
Regards,
Jeroen Kleijer
_______________________________________________
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users
--
Jeff Squyres
Cisco Systems