Re: [OMPI users] connect failed with errno=111

2007-09-13 Thread Adrian Knoth
On Thu, Sep 13, 2007 at 11:15:47AM -0500, Tim Campbell wrote: > workstations. When mpirun tries to start the processes on certain > nodes I get the following error output. > > [sr70][0,1,2][btl_tcp_endpoint.c: > 572:mca_btl_tcp_endpoint_complete_connect] connect() failed with > errno=111 >

Re: [OMPI users] connect failed with errno=111

2007-09-13 Thread Tim Campbell
Thanks. I think I figured out the problem. I found that in my .ssh/ known_hosts there were several "bad" keys associated with some of the machines in the gridengine pool. My hypothesis is that when mpirun was establishing the connection topology of the processes there was some process pa

Re: [OMPI users] connect failed with errno=111

2007-09-13 Thread Pak Lui
Hi Tim, You could try setting -mca pls_gridengine_verbose 1 to show whether SGE is able to start the ORTE daemons on the remote nodes successfully. It seems you are having the problem previously asked by another user, Perhaps you may want to follow this thread and check your ifconfig setting

[OMPI users] connect failed with errno=111

2007-09-13 Thread Tim Campbell
Greetings, I am using OpenMPI v1.2.3 via SGE on a network of amd64 workstations. When mpirun tries to start the processes on certain nodes I get the following error output. [sr70][0,1,2][btl_tcp_endpoint.c: 572:mca_btl_tcp_endpoint_complete_connect] connect() failed with errno=111 [sr71