[OMPI users] tcp_peer_send_blocking: send() to socket 9 failed: Broken pipe (32)

2018-02-09 Thread William Mitchell
When I try to run an MPI program on a network with a shared file system and
connected by ethernet, I get the error message "tcp_peer_send_blocking:
send() to socket 9 failed: Broken pipe (32)" followed by some suggestions
of what could cause it, none of which are my problem.  I have searched the
FAQ, mailing list archives, and googled the error message, with only a few
hits touching on it, none of which solved the problem.

This is on a Linux CentOS 7 system with Open MPI 1.10.6 and Intel Fortran
(more detailed system information below).

Here are details on how I encounter the problem:

me@host1> cat hellompi.f90
   program hello
   include 'mpif.h'
   integer rank, size, ierror, nl
   character(len=MPI_MAX_PROCESSOR_NAME) :: hostname

   call MPI_INIT(ierror)
   call MPI_COMM_SIZE(MPI_COMM_WORLD, size, ierror)
   call MPI_COMM_RANK(MPI_COMM_WORLD, rank, ierror)
   call MPI_GET_PROCESSOR_NAME(hostname, nl, ierror)
   print*, 'node', rank, ' of', size, ' on ', hostname(1:nl), ': Hello
world'
   call MPI_FINALIZE(ierror)
   end

me@host1> mpifort --showme
ifort -I/usr/include/openmpi-x86_64 -pthread -m64 -I/usr/lib64/openmpi/lib
-Wl,-rpath -Wl,/usr/lib64/openmpi/lib -Wl,--enable-new-dtags
-L/usr/lib64/openmpi/lib -lmpi_usempi -lmpi_mpifh -lmpi

me@host1> ifort --version
ifort (IFORT) 18.0.0 20170811
Copyright (C) 1985-2017 Intel Corporation.  All rights reserved.

me@host1> mpifort -o hellompi hellompi.f90

[Note: it runs on 1 machine, but not on two]

me@host1> mpirun -np 2 hellompi
 node   0  of   2  on host1.domain: Hello world
 node   1  of   2  on host1.domain: Hello world

me@host1> cat hosts
host2.domain
host1.domain

me@host1> mpirun -np 2 --hostfile hosts hellompi
[host2.domain:250313] [[46562,0],1] tcp_peer_send_blocking: send() to
socket 9 failed: Broken pipe (32)
--
ORTE was unable to reliably start one or more daemons.
This usually is caused by:
[suggested causes deleted]

Here is system information:

me@host2> cat /etc/redhat-release
CentOS Linux release 7.4.1708 (Core)

me@host1> uname -a
Linux host1.domain 3.10.0-693.17.1.el7.x86_64 #1 SMP Thu Jan 25 20:13:58
UTC 2018 x86_64 x86_64 x86_64 GNU/Linux

me@host1> rpm -qa | grep openmpi
mpitests-openmpi-4.1-1.el7.x86_64
openmpi-1.10.6-2.el7.x86_64
openmpi-devel-1.10.6-2.el7.x86_64

me@host1> ompi_info --all
[Results of this command for each host are in the attached files.]

me@host1> ompi_info -v ompi full --parsable
ompi_info: Error: unknown option "-v"
[Is the request to run that command given on the Open MPI "Getting Help"
web page an error?]

me@host1> printenv | grep OMPI
MPI_COMPILER=openmpi-x86_64
OMPI_F77=ifort
OMPI_FC=ifort
OMPI_MCA_mpi_yield_when_idle=1
OMPI_MCA_btl=tcp,self

I am using ssh-agent, and I can ssh between the two hosts.  In fact, from
host1 I can use ssh to request that host2 ssh back to host1:

me@host1> ssh -A host2 "ssh host1 hostname"
host1.domain

Any suggestions on how to solve this problem are appreciated.

Bill


ompi_info_all_host1.bz2
Description: BZip2 compressed data


ompi_info_all_host2.bz2
Description: BZip2 compressed data
___
users mailing list
users@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/users

Re: [OMPI users] tcp_peer_send_blocking: send() to socket 9 failed: Broken pipe (32)

2018-02-12 Thread William Mitchell
Thanks, George.  My sysadmin now says he is pretty sure it is the firewall,
but that "isn't going to change" so we need to find a solution.

On 9 February 2018 at 16:58, George Bosilca  wrote:

> What are the settings of the firewall on your 2 nodes ?
>
>   George.
>
>
>
> On Fri, Feb 9, 2018 at 3:08 PM, William Mitchell 
> wrote:
>
>> When I try to run an MPI program on a network with a shared file system
>> and connected by ethernet, I get the error message "tcp_peer_send_blocking:
>> send() to socket 9 failed: Broken pipe (32)" followed by some suggestions
>> of what could cause it, none of which are my problem.  I have searched the
>> FAQ, mailing list archives, and googled the error message, with only a few
>> hits touching on it, none of which solved the problem.
>>
>> This is on a Linux CentOS 7 system with Open MPI 1.10.6 and Intel Fortran
>> (more detailed system information below).
>>
>> Here are details on how I encounter the problem:
>>
>> me@host1> cat hellompi.f90
>>program hello
>>include 'mpif.h'
>>integer rank, size, ierror, nl
>>character(len=MPI_MAX_PROCESSOR_NAME) :: hostname
>>
>>call MPI_INIT(ierror)
>>call MPI_COMM_SIZE(MPI_COMM_WORLD, size, ierror)
>>call MPI_COMM_RANK(MPI_COMM_WORLD, rank, ierror)
>>call MPI_GET_PROCESSOR_NAME(hostname, nl, ierror)
>>print*, 'node', rank, ' of', size, ' on ', hostname(1:nl), ': Hello
>> world'
>>call MPI_FINALIZE(ierror)
>>end
>>
>> me@host1> mpifort --showme
>> ifort -I/usr/include/openmpi-x86_64 -pthread -m64
>> -I/usr/lib64/openmpi/lib -Wl,-rpath -Wl,/usr/lib64/openmpi/lib
>> -Wl,--enable-new-dtags -L/usr/lib64/openmpi/lib -lmpi_usempi -lmpi_mpifh
>> -lmpi
>>
>> me@host1> ifort --version
>> ifort (IFORT) 18.0.0 20170811
>> Copyright (C) 1985-2017 Intel Corporation.  All rights reserved.
>>
>> me@host1> mpifort -o hellompi hellompi.f90
>>
>> [Note: it runs on 1 machine, but not on two]
>>
>> me@host1> mpirun -np 2 hellompi
>>  node   0  of   2  on host1.domain: Hello world
>>  node   1  of   2  on host1.domain: Hello world
>>
>> me@host1> cat hosts
>> host2.domain
>> host1.domain
>>
>> me@host1> mpirun -np 2 --hostfile hosts hellompi
>> [host2.domain:250313] [[46562,0],1] tcp_peer_send_blocking: send() to
>> socket 9 failed: Broken pipe (32)
>> 
>> --
>> ORTE was unable to reliably start one or more daemons.
>> This usually is caused by:
>> [suggested causes deleted]
>>
>> Here is system information:
>>
>> me@host2> cat /etc/redhat-release
>> CentOS Linux release 7.4.1708 (Core)
>>
>> me@host1> uname -a
>> Linux host1.domain 3.10.0-693.17.1.el7.x86_64 #1 SMP Thu Jan 25 20:13:58
>> UTC 2018 x86_64 x86_64 x86_64 GNU/Linux
>>
>> me@host1> rpm -qa | grep openmpi
>> mpitests-openmpi-4.1-1.el7.x86_64
>> openmpi-1.10.6-2.el7.x86_64
>> openmpi-devel-1.10.6-2.el7.x86_64
>>
>> me@host1> ompi_info --all
>> [Results of this command for each host are in the attached files.]
>>
>> me@host1> ompi_info -v ompi full --parsable
>> ompi_info: Error: unknown option "-v"
>> [Is the request to run that command given on the Open MPI "Getting Help"
>> web page an error?]
>>
>> me@host1> printenv | grep OMPI
>> MPI_COMPILER=openmpi-x86_64
>> OMPI_F77=ifort
>> OMPI_FC=ifort
>> OMPI_MCA_mpi_yield_when_idle=1
>> OMPI_MCA_btl=tcp,self
>>
>> I am using ssh-agent, and I can ssh between the two hosts.  In fact, from
>> host1 I can use ssh to request that host2 ssh back to host1:
>>
>> me@host1> ssh -A host2 "ssh host1 hostname"
>> host1.domain
>>
>> Any suggestions on how to solve this problem are appreciated.
>>
>> Bill
>>
>> ___
>> users mailing list
>> users@lists.open-mpi.org
>> https://lists.open-mpi.org/mailman/listinfo/users
>>
>
>
> ___
> users mailing list
> users@lists.open-mpi.org
> https://lists.open-mpi.org/mailman/listinfo/users
>
___
users mailing list
users@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/users