What are the settings of the firewall on your 2 nodes ?

  George.



On Fri, Feb 9, 2018 at 3:08 PM, William Mitchell <wfma...@gmail.com> wrote:

> When I try to run an MPI program on a network with a shared file system
> and connected by ethernet, I get the error message "tcp_peer_send_blocking:
> send() to socket 9 failed: Broken pipe (32)" followed by some suggestions
> of what could cause it, none of which are my problem.  I have searched the
> FAQ, mailing list archives, and googled the error message, with only a few
> hits touching on it, none of which solved the problem.
>
> This is on a Linux CentOS 7 system with Open MPI 1.10.6 and Intel Fortran
> (more detailed system information below).
>
> Here are details on how I encounter the problem:
>
> me@host1> cat hellompi.f90
>    program hello
>    include 'mpif.h'
>    integer rank, size, ierror, nl
>    character(len=MPI_MAX_PROCESSOR_NAME) :: hostname
>
>    call MPI_INIT(ierror)
>    call MPI_COMM_SIZE(MPI_COMM_WORLD, size, ierror)
>    call MPI_COMM_RANK(MPI_COMM_WORLD, rank, ierror)
>    call MPI_GET_PROCESSOR_NAME(hostname, nl, ierror)
>    print*, 'node', rank, ' of', size, ' on ', hostname(1:nl), ': Hello
> world'
>    call MPI_FINALIZE(ierror)
>    end
>
> me@host1> mpifort --showme
> ifort -I/usr/include/openmpi-x86_64 -pthread -m64 -I/usr/lib64/openmpi/lib
> -Wl,-rpath -Wl,/usr/lib64/openmpi/lib -Wl,--enable-new-dtags
> -L/usr/lib64/openmpi/lib -lmpi_usempi -lmpi_mpifh -lmpi
>
> me@host1> ifort --version
> ifort (IFORT) 18.0.0 20170811
> Copyright (C) 1985-2017 Intel Corporation.  All rights reserved.
>
> me@host1> mpifort -o hellompi hellompi.f90
>
> [Note: it runs on 1 machine, but not on two]
>
> me@host1> mpirun -np 2 hellompi
>  node           0  of           2  on host1.domain: Hello world
>  node           1  of           2  on host1.domain: Hello world
>
> me@host1> cat hosts
> host2.domain
> host1.domain
>
> me@host1> mpirun -np 2 --hostfile hosts hellompi
> [host2.domain:250313] [[46562,0],1] tcp_peer_send_blocking: send() to
> socket 9 failed: Broken pipe (32)
> --------------------------------------------------------------------------
> ORTE was unable to reliably start one or more daemons.
> This usually is caused by:
> [suggested causes deleted]
>
> Here is system information:
>
> me@host2> cat /etc/redhat-release
> CentOS Linux release 7.4.1708 (Core)
>
> me@host1> uname -a
> Linux host1.domain 3.10.0-693.17.1.el7.x86_64 #1 SMP Thu Jan 25 20:13:58
> UTC 2018 x86_64 x86_64 x86_64 GNU/Linux
>
> me@host1> rpm -qa | grep openmpi
> mpitests-openmpi-4.1-1.el7.x86_64
> openmpi-1.10.6-2.el7.x86_64
> openmpi-devel-1.10.6-2.el7.x86_64
>
> me@host1> ompi_info --all
> [Results of this command for each host are in the attached files.]
>
> me@host1> ompi_info -v ompi full --parsable
> ompi_info: Error: unknown option "-v"
> [Is the request to run that command given on the Open MPI "Getting Help"
> web page an error?]
>
> me@host1> printenv | grep OMPI
> MPI_COMPILER=openmpi-x86_64
> OMPI_F77=ifort
> OMPI_FC=ifort
> OMPI_MCA_mpi_yield_when_idle=1
> OMPI_MCA_btl=tcp,self
>
> I am using ssh-agent, and I can ssh between the two hosts.  In fact, from
> host1 I can use ssh to request that host2 ssh back to host1:
>
> me@host1> ssh -A host2 "ssh host1 hostname"
> host1.domain
>
> Any suggestions on how to solve this problem are appreciated.
>
> Bill
>
> _______________________________________________
> users mailing list
> users@lists.open-mpi.org
> https://lists.open-mpi.org/mailman/listinfo/users
>
_______________________________________________
users mailing list
users@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/users

Reply via email to