Re: [OMPI users] tcp_peer_send_blocking: send() to socket 9 failed: Broken pipe (32)

2018-02-12 Thread Gilles Gouaillardet
William, On a typical HPC cluster, the internal interface is not protected by the firewall. If this is eth0, then you can mpirun --mca oob_tcp_if_include eth0 --mca btl_tcp_if_include eth0 ... If only a small range of port is available, then you will also need to use the oob_tcp_dynamic_ipv4_po

Re: [OMPI users] tcp_peer_send_blocking: send() to socket 9 failed: Broken pipe (32)

2018-02-12 Thread William Mitchell
Thanks, George. My sysadmin now says he is pretty sure it is the firewall, but that "isn't going to change" so we need to find a solution. On 9 February 2018 at 16:58, George Bosilca wrote: > What are the settings of the firewall on your 2 nodes ? > > George. > > > > On Fri, Feb 9, 2018 at 3:

Re: [OMPI users] tcp_peer_send_blocking: send() to socket 9 failed: Broken pipe (32)

2018-02-09 Thread George Bosilca
What are the settings of the firewall on your 2 nodes ? George. On Fri, Feb 9, 2018 at 3:08 PM, William Mitchell wrote: > When I try to run an MPI program on a network with a shared file system > and connected by ethernet, I get the error message "tcp_peer_send_blocking: > send() to socket

[OMPI users] tcp_peer_send_blocking: send() to socket 9 failed: Broken pipe (32)

2018-02-09 Thread William Mitchell
When I try to run an MPI program on a network with a shared file system and connected by ethernet, I get the error message "tcp_peer_send_blocking: send() to socket 9 failed: Broken pipe (32)" followed by some suggestions of what could cause it, none of which are my problem. I have searched the FA

Re: [OMPI users] tcp_peer_send_blocking: send() to socket 9 failed: Broken pipe (32) on openvz containers

2016-06-24 Thread Jeff Squyres (jsquyres)
Ok, great. I've opened up https://github.com/open-mpi/ompi/pull/1814 to track the issue. This hack around certainly isn't going to ship in an Open MPI production tarball; we should probably do something more formal / correct. > On Jun 24, 2016, at 10:31 AM, kna...@gmail.com wrote: > > Jeff,

Re: [OMPI users] tcp_peer_send_blocking: send() to socket 9 failed: Broken pipe (32) on openvz containers

2016-06-24 Thread knawnd
Jeff, It works now! Thank you so much! [user@ct110 hello]$ /opt/openmpi/1.10.3-1/bin/mpirun --mca btl self,tcp --mca btl_tcp_if_include venet0:0 --mca oob_tcp_if_include venet0:0 -npernode 1 -np 2 --hostfile mpi_hosts.txt hostname ct110 ct111 [user@ct110 hello]$ /opt/openmpi/1.10.3-1/bin/mpiru

Re: [OMPI users] tcp_peer_send_blocking: send() to socket 9 failed: Broken pipe (32) on openvz containers

2016-06-24 Thread Jeff Squyres (jsquyres)
On Jun 24, 2016, at 7:26 AM, kna...@gmail.com wrote: > >> mpirun --mca btl_tcp_if_include venet0:0 --mca oob_tcp_if_include >> venet0:0 ... > > See if that works. > Jeff, thanks a lot for such prompt reply, detailed explanation and > suggestion! But unfortunately the error is still the same:

Re: [OMPI users] tcp_peer_send_blocking: send() to socket 9 failed: Broken pipe (32) on openvz containers

2016-06-24 Thread knawnd
Jeff Squyres (jsquyres) wrote on 24/06/16 13:43: Nikolay -- Thanks for all the detail! That helps a tremendous amount. Open MPI actually uses IP networks in *two* ways: 1. for command and control 2. for MPI communications Your use of btl_tcp_if_include regulates #2, but not #1 -- you need to

Re: [OMPI users] tcp_peer_send_blocking: send() to socket 9 failed: Broken pipe (32) on openvz containers

2016-06-24 Thread Jeff Squyres (jsquyres)
Nikolay -- Thanks for all the detail! That helps a tremendous amount. Open MPI actually uses IP networks in *two* ways: 1. for command and control 2. for MPI communications Your use of btl_tcp_if_include regulates #2, but not #1 -- you need to add another MCA param to regulate #1. Try this:

[OMPI users] tcp_peer_send_blocking: send() to socket 9 failed: Broken pipe (32) on openvz containers

2016-06-24 Thread knawnd
Hi all! I am trying to build a cluster for MPI jobs using OpenVZ containers (https://openvz.org/Main_Page). I've been successfully using openvz+openmpi during many years but can't make it work with OpenMPI 1.10.x. So I have a server with openvz support enabled. The output of it's ifconfig: [r