Ok, great. I've opened up https://github.com/open-mpi/ompi/pull/1814 to track the issue. This hack around certainly isn't going to ship in an Open MPI production tarball; we should probably do something more formal / correct.
> On Jun 24, 2016, at 10:31 AM, kna...@gmail.com wrote: > > Jeff, It works now! Thank you so much! > > [user@ct110 hello]$ /opt/openmpi/1.10.3-1/bin/mpirun --mca btl self,tcp --mca > btl_tcp_if_include venet0:0 --mca oob_tcp_if_include venet0:0 -npernode 1 -np > 2 --hostfile mpi_hosts.txt hostname > ct110 > ct111 > > [user@ct110 hello]$ /opt/openmpi/1.10.3-1/bin/mpirun --mca btl self,tcp --mca > btl_tcp_if_include venet0:0 --mca oob_tcp_if_include venet0:0 -npernode 1 -np > 2 --hostfile mpi_hosts.txt ./hello.bin > Hello world! from processor 0 (name=ct110 ) out of 2 > wall clock time = 0.000001 > Hello world! from processor 1 (name=ct111 ) out of 2 > wall clock time = 0.000002 > > It's not even needed to specify venet0:0: > [user@ct110 hello]$ /opt/openmpi/1.10.3-1/bin/mpirun -npernode 1 -np 2 > --hostfile mpi_hosts.txt ./hello.bin > Hello world! from processor 0 (name=ct110 ) out of 2 > wall clock time = 0.000002 > Hello world! from processor 1 (name=ct111 ) out of 2 > wall clock time = 0.000001 > > Thanks a lot indeed! > > > Jeff Squyres (jsquyres) wrote on 24/06/16 16:08: >> On Jun 24, 2016, at 7:26 AM, kna...@gmail.com wrote: >>> >>>> mpirun --mca btl_tcp_if_include venet0:0 --mca oob_tcp_if_include >>>> venet0:0 ... >>>> See if that works. >>> Jeff, thanks a lot for such prompt reply, detailed explanation and >>> suggestion! But unfortunately the error is still the same: >>> >>> [user@ct110 hello]$ /opt/openmpi/1.10.3/bin/mpirun --mca btl self,tcp --mca >>> btl_tcp_if_include venet0:0 --mca oob_tcp_if_include venet0:0 -np 1 -host >>> 10.0.50.41 hostname >>> [ct111.domain.org:01054] [[12888,0],1] tcp_peer_send_blocking: send() to >>> socket 9 failed: Broken pipe (32) >>> [...snip...] >> >> I'm reminded of the fact that we did some tests on OpenVZ recently, and I >> had to add a hack to make Open MPI skip one of the interfaces: >> >> ----- >> diff --git a/opal/mca/if/posix_ipv4/if_posix.c >> b/opal/mca/if/posix_ipv4/if_posix >> index 6f75533..ed447e7 100644 >> --- a/opal/mca/if/posix_ipv4/if_posix.c >> +++ b/opal/mca/if/posix_ipv4/if_posix.c >> @@ -221,6 +221,15 @@ static int if_posix_open(void) >> strncpy(intf->if_name, ifr->ifr_name, sizeof(intf->if_name) - 1); >> intf->if_flags = ifr->ifr_flags; >> >> + // JMS Hackaround for OpenVZ >> + if (strcmp(intf->if_name, "venet0") == 0) { >> + opal_output_verbose(1, opal_if_base_framework.framework_output, >> + "OpenVZ hack:%s:%d: skipping interface >> venet0", >> + __FILE__, __LINE__); >> + OBJ_RELEASE(intf); >> + continue; >> + } >> + >> /* every new address gets its own internal if_index */ >> intf->if_index = opal_list_get_size(&opal_if_list)+1; >> >> ----- >> >> Can you try this and see if it works for you? >> >> If so, we might need to something a bit more methodical / deliberate to make >> Open MPI work on openvz. >> > _______________________________________________ > users mailing list > us...@open-mpi.org > Subscription: https://www.open-mpi.org/mailman/listinfo.cgi/users > Link to this post: > http://www.open-mpi.org/community/lists/users/2016/06/29544.php -- Jeff Squyres jsquy...@cisco.com For corporate legal information go to: http://www.cisco.com/web/about/doing_business/legal/cri/