Ok, great.

I've opened up https://github.com/open-mpi/ompi/pull/1814 to track the issue.  
This hack around certainly isn't going to ship in an Open MPI production 
tarball; we should probably do something more formal / correct.


> On Jun 24, 2016, at 10:31 AM, kna...@gmail.com wrote:
> 
> Jeff, It works now! Thank you so much!
> 
> [user@ct110 hello]$ /opt/openmpi/1.10.3-1/bin/mpirun --mca btl self,tcp --mca 
> btl_tcp_if_include venet0:0 --mca oob_tcp_if_include venet0:0 -npernode 1 -np 
> 2 --hostfile mpi_hosts.txt hostname
> ct110
> ct111
> 
> [user@ct110 hello]$ /opt/openmpi/1.10.3-1/bin/mpirun --mca btl self,tcp --mca 
> btl_tcp_if_include venet0:0 --mca oob_tcp_if_include venet0:0 -npernode 1 -np 
> 2 --hostfile mpi_hosts.txt ./hello.bin
> Hello world! from processor 0 (name=ct110 ) out of 2
> wall clock time = 0.000001
> Hello world! from processor 1 (name=ct111 ) out of 2
> wall clock time = 0.000002
> 
> It's not even needed to specify venet0:0:
> [user@ct110 hello]$ /opt/openmpi/1.10.3-1/bin/mpirun -npernode 1 -np 2 
> --hostfile mpi_hosts.txt ./hello.bin
> Hello world! from processor 0 (name=ct110 ) out of 2
> wall clock time = 0.000002
> Hello world! from processor 1 (name=ct111 ) out of 2
> wall clock time = 0.000001
> 
> Thanks a lot indeed!
> 
> 
> Jeff Squyres (jsquyres) wrote on 24/06/16 16:08:
>> On Jun 24, 2016, at 7:26 AM, kna...@gmail.com wrote:
>>> 
>>>>     mpirun --mca btl_tcp_if_include venet0:0 --mca oob_tcp_if_include 
>>>> venet0:0 ...
>>>> See if that works.
>>> Jeff, thanks a lot for such prompt reply, detailed explanation and 
>>> suggestion! But unfortunately the error is still the same:
>>> 
>>> [user@ct110 hello]$ /opt/openmpi/1.10.3/bin/mpirun --mca btl self,tcp --mca 
>>> btl_tcp_if_include venet0:0 --mca oob_tcp_if_include venet0:0 -np 1 -host 
>>> 10.0.50.41 hostname
>>> [ct111.domain.org:01054] [[12888,0],1] tcp_peer_send_blocking: send() to 
>>> socket 9 failed: Broken pipe (32)
>>> [...snip...]
>> 
>> I'm reminded of the fact that we did some tests on OpenVZ recently, and I 
>> had to add a hack to make Open MPI skip one of the interfaces:
>> 
>> -----
>> diff --git a/opal/mca/if/posix_ipv4/if_posix.c 
>> b/opal/mca/if/posix_ipv4/if_posix
>> index 6f75533..ed447e7 100644
>> --- a/opal/mca/if/posix_ipv4/if_posix.c
>> +++ b/opal/mca/if/posix_ipv4/if_posix.c
>> @@ -221,6 +221,15 @@ static int if_posix_open(void)
>>         strncpy(intf->if_name, ifr->ifr_name, sizeof(intf->if_name) - 1);
>>         intf->if_flags = ifr->ifr_flags;
>> 
>> +       // JMS Hackaround for OpenVZ
>> +       if (strcmp(intf->if_name, "venet0") == 0) {
>> +            opal_output_verbose(1, opal_if_base_framework.framework_output,
>> +                                "OpenVZ hack:%s:%d: skipping interface 
>> venet0",
>> +                                __FILE__, __LINE__);
>> +           OBJ_RELEASE(intf);
>> +            continue;
>> +       }
>> +
>>         /* every new address gets its own internal if_index */
>>         intf->if_index = opal_list_get_size(&opal_if_list)+1;
>> 
>> -----
>> 
>> Can you try this and see if it works for you?
>> 
>> If so, we might need to something a bit more methodical / deliberate to make 
>> Open MPI work on openvz.
>> 
> _______________________________________________
> users mailing list
> us...@open-mpi.org
> Subscription: https://www.open-mpi.org/mailman/listinfo.cgi/users
> Link to this post: 
> http://www.open-mpi.org/community/lists/users/2016/06/29544.php


-- 
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to: 
http://www.cisco.com/web/about/doing_business/legal/cri/

Reply via email to