I'm assuming that these are Linux hosts. If so, errno 111 is "connection refused" possibly meaning that there is still some firewall active or the wrong interface is being used to establish connections between these machines.

Can you send the output of "ifconfig" (might be /sbin/ifconfig on your machine?) from both machines?


On Feb 11, 2007, at 3:45 PM, matteo.guglie...@epfl.ch wrote:

Since I've installed openmpi I cannot submit any job that uses cpus from
different machines.

### hostfile ###
lcbcpc02.epfl.ch slots=4 max-slots=4
lcbcpc04.epfl.ch slots=4 max-slots=4
################

### error message ###
[matteo@lcbcpc02 TEST]$ mpirun --hostfile ~matteo/hostfile -np 8
/home/matteo/Software/NWChem/5.0/bin/nwchem ./nwchem.nw
[0,1,5][../../../../../ompi/mca/btl/tcp/btl_tcp_endpoint.c: 572:mca_btl_tcp_endpoint_complete_connect] [0,1,6][../../../../../ompi/mca/btl/tcp/btl_tcp_endpoint.c: 572:mca_btl_tcp_endpoint_complete_connect]
connect() failed with errno=111
6: lcbcpc04.epfl.ch len=16
[0,1,4][../../../../../ompi/mca/btl/tcp/btl_tcp_endpoint.c: 572:mca_btl_tcp_endpoint_complete_connect]
connect() failed with errno=111
4: lcbcpc04.epfl.ch len=16
[0,1,7][../../../../../ompi/mca/btl/tcp/btl_tcp_endpoint.c: 572:mca_btl_tcp_endpoint_complete_connect]
connect() failed with errno=111
7: lcbcpc04.epfl.ch len=16
connect() failed with errno=111
5: lcbcpc04.epfl.ch len=16
#####################

I did disable the firewall on both machines but I still get that error message.

Thanks,
MG.
_______________________________________________
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users


--
Jeff Squyres
Server Virtualization Business Unit
Cisco Systems

Reply via email to