Hi again, Angel de Vicente <ang...@iac.es> writes: > yes, that's just what I did with orted. I saw the port that it was > trying to connect and telnet to it, and I got "No route to host", so > that's why I was going the firewall path. Hopefully the sysadmins can > disable the firewall for the internal network today, and I can see if > that solves the issue.
OK, removing the firewall for the private network improved things a lot. A simple "Hello World" seems to work without issues, but if I run my code, I have a problem like this: [angelv@comer RTI2D.Parallel]$ mpiexec -prefix $OMPI_PREFIX -hostfile $MPI_HOSTS -n 10 ../../../mancha2D_mpi_h5fc.x mancha.trol [...] [comer][[58110,1],0][btl_tcp_endpoint.c:655:mca_btl_tcp_endpoint_complete_connect] connect() to 161.72.206.3 failed: No route to host (113) [comer][[58110,1],1][btl_tcp_endpoint.c:655:mca_btl_tcp_endpoint_complete_connect] [comer][[58110,1],3][btl_tcp_endpoint.c:655:mca_btl_tcp_endpoint_complete_connect] connect() to 161.72.206.3 failed: No route to host (113) connect() to 161.72.206.3 failed: No route to host (113) [comer][[58110,1],1][btl_tcp_endpoint.c:655:mca_btl_tcp_endpoint_complete_connect] connect() to 161.72.206.3 failed: No route to host (113) [comer][[58110,1],2][btl_tcp_endpoint.c:655:mca_btl_tcp_endpoint_complete_connect] connect() to 161.72.206.3 failed: No route to host (113) But MPI_HOSTS points to a file with $ cat /net/nas7/polar/minicluster/machinefile-openmpi c0 slots=5 c1 slots=5 c2 slots=5 c0, c1, and c2 are the names of the machines in the internal network, but for some reason it is using the public interfaces and complaining (the firewall in those is still active). I thought just specifying the names of the machines in the machinefile would make sure that we were using the right interface... Any help? Thanks, -- Ángel de Vicente http://angel-de-vicente.blogspot.com/