What are the settings of the firewall on your 2 nodes ? George.
On Fri, Feb 9, 2018 at 3:08 PM, William Mitchell <wfma...@gmail.com> wrote: > When I try to run an MPI program on a network with a shared file system > and connected by ethernet, I get the error message "tcp_peer_send_blocking: > send() to socket 9 failed: Broken pipe (32)" followed by some suggestions > of what could cause it, none of which are my problem. I have searched the > FAQ, mailing list archives, and googled the error message, with only a few > hits touching on it, none of which solved the problem. > > This is on a Linux CentOS 7 system with Open MPI 1.10.6 and Intel Fortran > (more detailed system information below). > > Here are details on how I encounter the problem: > > me@host1> cat hellompi.f90 > program hello > include 'mpif.h' > integer rank, size, ierror, nl > character(len=MPI_MAX_PROCESSOR_NAME) :: hostname > > call MPI_INIT(ierror) > call MPI_COMM_SIZE(MPI_COMM_WORLD, size, ierror) > call MPI_COMM_RANK(MPI_COMM_WORLD, rank, ierror) > call MPI_GET_PROCESSOR_NAME(hostname, nl, ierror) > print*, 'node', rank, ' of', size, ' on ', hostname(1:nl), ': Hello > world' > call MPI_FINALIZE(ierror) > end > > me@host1> mpifort --showme > ifort -I/usr/include/openmpi-x86_64 -pthread -m64 -I/usr/lib64/openmpi/lib > -Wl,-rpath -Wl,/usr/lib64/openmpi/lib -Wl,--enable-new-dtags > -L/usr/lib64/openmpi/lib -lmpi_usempi -lmpi_mpifh -lmpi > > me@host1> ifort --version > ifort (IFORT) 18.0.0 20170811 > Copyright (C) 1985-2017 Intel Corporation. All rights reserved. > > me@host1> mpifort -o hellompi hellompi.f90 > > [Note: it runs on 1 machine, but not on two] > > me@host1> mpirun -np 2 hellompi > node 0 of 2 on host1.domain: Hello world > node 1 of 2 on host1.domain: Hello world > > me@host1> cat hosts > host2.domain > host1.domain > > me@host1> mpirun -np 2 --hostfile hosts hellompi > [host2.domain:250313] [[46562,0],1] tcp_peer_send_blocking: send() to > socket 9 failed: Broken pipe (32) > -------------------------------------------------------------------------- > ORTE was unable to reliably start one or more daemons. > This usually is caused by: > [suggested causes deleted] > > Here is system information: > > me@host2> cat /etc/redhat-release > CentOS Linux release 7.4.1708 (Core) > > me@host1> uname -a > Linux host1.domain 3.10.0-693.17.1.el7.x86_64 #1 SMP Thu Jan 25 20:13:58 > UTC 2018 x86_64 x86_64 x86_64 GNU/Linux > > me@host1> rpm -qa | grep openmpi > mpitests-openmpi-4.1-1.el7.x86_64 > openmpi-1.10.6-2.el7.x86_64 > openmpi-devel-1.10.6-2.el7.x86_64 > > me@host1> ompi_info --all > [Results of this command for each host are in the attached files.] > > me@host1> ompi_info -v ompi full --parsable > ompi_info: Error: unknown option "-v" > [Is the request to run that command given on the Open MPI "Getting Help" > web page an error?] > > me@host1> printenv | grep OMPI > MPI_COMPILER=openmpi-x86_64 > OMPI_F77=ifort > OMPI_FC=ifort > OMPI_MCA_mpi_yield_when_idle=1 > OMPI_MCA_btl=tcp,self > > I am using ssh-agent, and I can ssh between the two hosts. In fact, from > host1 I can use ssh to request that host2 ssh back to host1: > > me@host1> ssh -A host2 "ssh host1 hostname" > host1.domain > > Any suggestions on how to solve this problem are appreciated. > > Bill > > _______________________________________________ > users mailing list > users@lists.open-mpi.org > https://lists.open-mpi.org/mailman/listinfo/users >
_______________________________________________ users mailing list users@lists.open-mpi.org https://lists.open-mpi.org/mailman/listinfo/users