I'm assuming that these are Linux hosts. If so, errno 111 is
"connection refused" possibly meaning that there is still some
firewall active or the wrong interface is being used to establish
connections between these machines.
Can you send the output of "ifconfig" (might be /sbin/ifconfig on
your machine?) from both machines?
On Feb 11, 2007, at 3:45 PM, matteo.guglie...@epfl.ch wrote:
Since I've installed openmpi I cannot submit any job that uses cpus
from
different machines.
### hostfile ###
lcbcpc02.epfl.ch slots=4 max-slots=4
lcbcpc04.epfl.ch slots=4 max-slots=4
################
### error message ###
[matteo@lcbcpc02 TEST]$ mpirun --hostfile ~matteo/hostfile -np 8
/home/matteo/Software/NWChem/5.0/bin/nwchem ./nwchem.nw
[0,1,5][../../../../../ompi/mca/btl/tcp/btl_tcp_endpoint.c:
572:mca_btl_tcp_endpoint_complete_connect]
[0,1,6][../../../../../ompi/mca/btl/tcp/btl_tcp_endpoint.c:
572:mca_btl_tcp_endpoint_complete_connect]
connect() failed with errno=111
6: lcbcpc04.epfl.ch len=16
[0,1,4][../../../../../ompi/mca/btl/tcp/btl_tcp_endpoint.c:
572:mca_btl_tcp_endpoint_complete_connect]
connect() failed with errno=111
4: lcbcpc04.epfl.ch len=16
[0,1,7][../../../../../ompi/mca/btl/tcp/btl_tcp_endpoint.c:
572:mca_btl_tcp_endpoint_complete_connect]
connect() failed with errno=111
7: lcbcpc04.epfl.ch len=16
connect() failed with errno=111
5: lcbcpc04.epfl.ch len=16
#####################
I did disable the firewall on both machines but I still get that
error message.
Thanks,
MG.
_______________________________________________
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users
--
Jeff Squyres
Server Virtualization Business Unit
Cisco Systems