Hi, I'm trying to get an openmpi application running accross different nodes but seem to have hit a snag when the processes are on different nodes, especially when the machines are on different TCP subnets. The orted daemons start up fine but after that application borks with the message
[0,1,2][btl_tcp_endpoint.c:572:mca_btl_tcp_endpoint_complete_connect] connect() failed with errno=111 I've read in this thread http://thread.gmane.org/gmane.comp.clustering.open-mpi.user/3427/focus=3437 that openmpi currently can't do this yet but (pre-release?) versions of openmpi 1.3 will work. I've tried compiling openmpi 1.3a (nightly build) and running my program with that (compiled with the mpicc of openmpi 1.3a) but I got the same error message. Can anybody confirm that: 1) openmpi has dificulties using the tcp btl accross different subnets 2) there are currently no workarounds for this. If there are solutions to this I'd really like to know about it as I've been trying this for quite a while now. Regards, Jeroen Kleijer