It means you started the jobs ok (via ssh) but Open MPI wasn't able to open TCP sockets between the two MPI processes. Open MPI needs to be able to communicate via random TCP ports between its MPI processes.

On Mar 18, 2009, at 8:39 AM, Bernhard Knapp wrote:

Hey again,

I tried to build a work around via port redirection: iptables -t nat -A PREROUTING -i eth1 -p tcp --dport 22 -j REDIRECT --to-port 5101


If I do that then I can start the job:

mpirun -np 2 -machinefile /home/bknapp/scripts/machinefile.txt mdrun -np 2 -nice 0 -s 1fyt_PKYVKQNTLELAT_bindingRegionsOnly.md.tpr - o 1fyt_PKYVKQNTLELAT_bindingRegionsOnly.md.trr -c 1fyt_PKYVKQNTLELAT_bindingRegionsOnly.md.pdb -g 1fyt_PKYVKQNTLELAT_bindingRegionsOnly.md.log -e 1fyt_PKYVKQNTLELAT_bindingRegionsOnly.md.edr -v
bknapp@192.168.0.104's password:
NNODES=2, MYRANK=0, HOSTNAME=quoVadis01
NNODES=2, MYRANK=1, HOSTNAME=quoVadis04

but it comes up with "[quoVadis01][[24802,1],0][btl_tcp_endpoint.c: 631:mca_btl_tcp_endpoint_complete_connect] connect() failed: No route to host (113)". The CPUs are calculating on both (physically different machines) but unfortunately no results are written ...

Was the port redirection of 22 not enough or is there another problem?

thx
Bernhard





-------- Original Message --------
Subject:
Re: open mpi on non standard ssh port
Date:
Wed, 18 Mar 2009 09:19:18 +0100
From:
Bernhard Knapp <bernhard.kn...@meduniwien.ac.at>
To:
us...@open-mpi.org
References:
<mailman.2006.1237281160.6040.us...@open-mpi.org>


come on, it must be somehow possible to use openmpi not on port 22!? ;-)

>
>------------------------------
>
>Message: 3
>Date: Tue, 17 Mar 2009 09:45:29 +0100
>From: Bernhard Knapp <bernhard.kn...@meduniwien.ac.at>
>Subject: [OMPI users] open mpi on non standard ssh port
>To: us...@open-mpi.org
>Message-ID: <49bf6329.8090...@meduniwien.ac.at>
>Content-Type: text/plain; charset=ISO-8859-1; format=flowed
>
>Hi
>
>I want to start a gromacs simulation on a small cluster where non
>standard ports are used for ssh. If I just use a "normal" maschinelist
>file (with the ips of the nodes), consequently, the following error
>comes up:
>ssh: connect to host 192.168.0.103 port 22: Connection refused
>
>I guess that I need to somehow tell him to use the other ports. I tried
>it in the following way (maschinelist file):
>192.168.0.101 -p 5101
>192.168.0.102 -p 5102
>192.168.0.103 -p 5103
>192.168.0.104 -p 5104
>
>But it seems this is not the correct way to specifiy the port:
>Open RTE detected a parse error in the hostfile:
>    /home/bknapp/scripts/machinefile.txt
>It occured on line number 1 on token 5:
>    -p
>
>How can I tell him to use port 5101 on machine 192.168.0.101?
>May be the question is stupid but I could not find a solution via google
>or search function ...
>
>cheers
>Bernhard
>
>
>
>


--
Dipl.-Ing. (FH) Bernhard Knapp
Univ.-Ass.postgrad.
Unit for Medical Statistics and Informatics - Section for Biomedical Computersimulation and Bioinformatics
Medical University of Vienna - General Hospital
Spitalgasse 23 A-1090 WIEN / AUSTRIA
Room: BT88 - 88.03.712
Phone: +43(1) 40400-6673
_______________________________________________
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users


--
Jeff Squyres
Cisco Systems

Reply via email to