Hello

I'm running an FDTD programm (meep) using open-mpi on a mini-cluster consisting of 2 computers. Since the exchange of the mainbord on the node (with an identical one as before) I have a problem. I can't find the change in the configurations which is now causing the problen.

Here's my problem:
I can start the meep application by mpi-run on each node individually and the program runs without any problems. However when I try to run the program distributed over both computers I get at some point the following error message: ...[0,1,1][btl_tcp_endpoint.c: 572:mca_btl_tcp_endpoint_complete_connect] connect() failed with errno=110
Which translates by Perl as: Connection timed out at -e line 1.

However I can't figure out where the problem lies in my network configuration. SSH tunnels from one computer to another works. I also can reach the internet from the node.

In the attached archive there's the config.log from the top open-mpi tree, there's the output of ompi_info --all and there's the network configuration of both computers.

I'm really greatfull for any help. Thank you!

Best regards
Roland Albrecht

--
___________________________________________

Roland Albrecht, Dipl. Phys. ETH
-------------------------------------------

Universität des Saarlandes
Fachrichtung 7.3 (Technische Physik)
AG Prof. Dr. Christoph Becher
Campus E2.6, Zimmer 2.04
D-66123 Saarbrücken
Germany

Phone:+49(0)681 302 3418
Fax: +49(0)681 302 4676
skype: roland_albrecht

Attachment: mpi.rar
Description: Binary data

Reply via email to