HelloI'm running an FDTD programm (meep) using open-mpi on a mini-cluster consisting of 2 computers. Since the exchange of the mainbord on the node (with an identical one as before) I have a problem. I can't find the change in the configurations which is now causing the problen.
Here's my problem:I can start the meep application by mpi-run on each node individually and the program runs without any problems. However when I try to run the program distributed over both computers I get at some point the following error message: ...[0,1,1][btl_tcp_endpoint.c: 572:mca_btl_tcp_endpoint_complete_connect] connect() failed with errno=110
Which translates by Perl as: Connection timed out at -e line 1.However I can't figure out where the problem lies in my network configuration. SSH tunnels from one computer to another works. I also can reach the internet from the node.
In the attached archive there's the config.log from the top open-mpi tree, there's the output of ompi_info --all and there's the network configuration of both computers.
I'm really greatfull for any help. Thank you! Best regards Roland Albrecht -- ___________________________________________ Roland Albrecht, Dipl. Phys. ETH ------------------------------------------- Universität des Saarlandes Fachrichtung 7.3 (Technische Physik) AG Prof. Dr. Christoph Becher Campus E2.6, Zimmer 2.04 D-66123 Saarbrücken Germany Phone:+49(0)681 302 3418 Fax: +49(0)681 302 4676 skype: roland_albrecht
mpi.rar
Description: Binary data