Jeff, It works now! Thank you so much!
[user@ct110 hello]$ /opt/openmpi/1.10.3-1/bin/mpirun --mca btl self,tcp --mca btl_tcp_if_include
venet0:0 --mca oob_tcp_if_include venet0:0 -npernode 1 -np 2 --hostfile mpi_hosts.txt hostname
ct110
ct111
[user@ct110 hello]$ /opt/openmpi/1.10.3-1/bin/mpirun --mca btl self,tcp --mca btl_tcp_if_include
venet0:0 --mca oob_tcp_if_include venet0:0 -npernode 1 -np 2 --hostfile mpi_hosts.txt ./hello.bin
Hello world! from processor 0 (name=ct110 ) out of 2
wall clock time = 0.000001
Hello world! from processor 1 (name=ct111 ) out of 2
wall clock time = 0.000002
It's not even needed to specify venet0:0:
[user@ct110 hello]$ /opt/openmpi/1.10.3-1/bin/mpirun -npernode 1 -np 2 --hostfile mpi_hosts.txt
./hello.bin
Hello world! from processor 0 (name=ct110 ) out of 2
wall clock time = 0.000002
Hello world! from processor 1 (name=ct111 ) out of 2
wall clock time = 0.000001
Thanks a lot indeed!
Jeff Squyres (jsquyres) wrote on 24/06/16 16:08:
On Jun 24, 2016, at 7:26 AM, kna...@gmail.com wrote:
mpirun --mca btl_tcp_if_include venet0:0 --mca oob_tcp_if_include venet0:0
...
See if that works.
Jeff, thanks a lot for such prompt reply, detailed explanation and suggestion!
But unfortunately the error is still the same:
[user@ct110 hello]$ /opt/openmpi/1.10.3/bin/mpirun --mca btl self,tcp --mca
btl_tcp_if_include venet0:0 --mca oob_tcp_if_include venet0:0 -np 1 -host
10.0.50.41 hostname
[ct111.domain.org:01054] [[12888,0],1] tcp_peer_send_blocking: send() to socket
9 failed: Broken pipe (32)
[...snip...]
I'm reminded of the fact that we did some tests on OpenVZ recently, and I had
to add a hack to make Open MPI skip one of the interfaces:
-----
diff --git a/opal/mca/if/posix_ipv4/if_posix.c b/opal/mca/if/posix_ipv4/if_posix
index 6f75533..ed447e7 100644
--- a/opal/mca/if/posix_ipv4/if_posix.c
+++ b/opal/mca/if/posix_ipv4/if_posix.c
@@ -221,6 +221,15 @@ static int if_posix_open(void)
strncpy(intf->if_name, ifr->ifr_name, sizeof(intf->if_name) - 1);
intf->if_flags = ifr->ifr_flags;
+ // JMS Hackaround for OpenVZ
+ if (strcmp(intf->if_name, "venet0") == 0) {
+ opal_output_verbose(1, opal_if_base_framework.framework_output,
+ "OpenVZ hack:%s:%d: skipping interface venet0",
+ __FILE__, __LINE__);
+ OBJ_RELEASE(intf);
+ continue;
+ }
+
/* every new address gets its own internal if_index */
intf->if_index = opal_list_get_size(&opal_if_list)+1;
-----
Can you try this and see if it works for you?
If so, we might need to something a bit more methodical / deliberate to make
Open MPI work on openvz.