Hello It has indeed been a problem with the firewall. Thanks
Best regards Roland Albrecht >Do you have the Linux firewall running on either of your machines, >perchance? This can either block random socket connections between >nodes (which Open MPI's TCP communication will use) or eat the >connection requests in a black-hole fashion such that the connections >will timeout. On Jan 16, 2008, at 5:35 AM, Roland Albrecht wrote: > > Hello > > > > I'm running an FDTD programm (meep) using open-mpi on a mini-cluster > > consisting of 2 computers. Since the exchange of the mainbord on the > > node (with an identical one as before) I have a problem. I can't > > find the change in the configurations which is now causing the > > problen. > > > > Here's my problem: > > I can start the meep application by mpi-run on each node > > individually and the program runs without any problems. > > However when I try to run the program distributed over both > > computers I get at some point the following error message: > > ...[0,1,1][btl_tcp_endpoint.c: > > 572:mca_btl_tcp_endpoint_complete_connect] connect() failed with > > errno=110 > > Which translates by Perl as: Connection timed out at -e line 1. > > > > However I can't figure out where the problem lies in my network > > configuration. SSH tunnels from one computer to another works. I > > also can reach the internet from the node. > > > > In the attached archive there's the config.log from the top open-mpi > > tree, there's the output of ompi_info --all and there's the network > > configuration of both computers. > > > > I'm really greatfull for any help. Thank you! > > > > Best regards > > Roland Albrecht