Jeff, It works now! Thank you so much!

[user@ct110 hello]$ /opt/openmpi/1.10.3-1/bin/mpirun --mca btl self,tcp --mca btl_tcp_if_include venet0:0 --mca oob_tcp_if_include venet0:0 -npernode 1 -np 2 --hostfile mpi_hosts.txt hostname
ct110
ct111

[user@ct110 hello]$ /opt/openmpi/1.10.3-1/bin/mpirun --mca btl self,tcp --mca btl_tcp_if_include venet0:0 --mca oob_tcp_if_include venet0:0 -npernode 1 -np 2 --hostfile mpi_hosts.txt ./hello.bin
Hello world! from processor 0 (name=ct110 ) out of 2
wall clock time = 0.000001
Hello world! from processor 1 (name=ct111 ) out of 2
wall clock time = 0.000002

It's not even needed to specify venet0:0:
[user@ct110 hello]$ /opt/openmpi/1.10.3-1/bin/mpirun -npernode 1 -np 2 --hostfile mpi_hosts.txt ./hello.bin
Hello world! from processor 0 (name=ct110 ) out of 2
wall clock time = 0.000002
Hello world! from processor 1 (name=ct111 ) out of 2
wall clock time = 0.000001

Thanks a lot indeed!


Jeff Squyres (jsquyres) wrote on 24/06/16 16:08:
On Jun 24, 2016, at 7:26 AM, kna...@gmail.com wrote:

     mpirun --mca btl_tcp_if_include venet0:0 --mca oob_tcp_if_include venet0:0 
...
See if that works.
Jeff, thanks a lot for such prompt reply, detailed explanation and suggestion! 
But unfortunately the error is still the same:

[user@ct110 hello]$ /opt/openmpi/1.10.3/bin/mpirun --mca btl self,tcp --mca 
btl_tcp_if_include venet0:0 --mca oob_tcp_if_include venet0:0 -np 1 -host 
10.0.50.41 hostname
[ct111.domain.org:01054] [[12888,0],1] tcp_peer_send_blocking: send() to socket 
9 failed: Broken pipe (32)
[...snip...]

I'm reminded of the fact that we did some tests on OpenVZ recently, and I had 
to add a hack to make Open MPI skip one of the interfaces:

-----
diff --git a/opal/mca/if/posix_ipv4/if_posix.c b/opal/mca/if/posix_ipv4/if_posix
index 6f75533..ed447e7 100644
--- a/opal/mca/if/posix_ipv4/if_posix.c
+++ b/opal/mca/if/posix_ipv4/if_posix.c
@@ -221,6 +221,15 @@ static int if_posix_open(void)
         strncpy(intf->if_name, ifr->ifr_name, sizeof(intf->if_name) - 1);
         intf->if_flags = ifr->ifr_flags;

+       // JMS Hackaround for OpenVZ
+       if (strcmp(intf->if_name, "venet0") == 0) {
+            opal_output_verbose(1, opal_if_base_framework.framework_output,
+                                "OpenVZ hack:%s:%d: skipping interface venet0",
+                                __FILE__, __LINE__);
+           OBJ_RELEASE(intf);
+            continue;
+       }
+
         /* every new address gets its own internal if_index */
         intf->if_index = opal_list_get_size(&opal_if_list)+1;

-----

Can you try this and see if it works for you?

If so, we might need to something a bit more methodical / deliberate to make 
Open MPI work on openvz.

Reply via email to