That is indeed bizarre - we haven’t heard of anything similar from other users. 
What is your network configuration? If you use oob_tcp_if_include or exclude, 
can you resolve the problem?


> On Nov 10, 2014, at 4:50 AM, Reuti <re...@staff.uni-marburg.de> wrote:
> 
> Am 10.11.2014 um 12:50 schrieb Jeff Squyres (jsquyres):
> 
>> Wow, that's pretty terrible!  :(
>> 
>> Is the behavior BTL-specific, perchance?  E.G., if you only use certain 
>> BTLs, does the delay disappear?
> 
> You mean something like:
> 
> reuti@annemarie:~> date; mpiexec -mca btl self,tcp -n 4 --hostfile machines 
> ./mpihello; date
> Mon Nov 10 13:44:34 CET 2014
> Hello World from Node 1.
> Total: 4
> Universe: 4
> Hello World from Node 0.
> Hello World from Node 3.
> Hello World from Node 2.
> Mon Nov 10 13:46:42 CET 2014
> 
> (the above was even the latest v1.8.3-186-g978f61d)
> 
> Falling back to 1.8.1 gives (as expected):
> 
> reuti@annemarie:~> date; mpiexec -mca btl self,tcp -n 4 --hostfile machines 
> ./mpihello; date
> Mon Nov 10 13:49:51 CET 2014
> Hello World from Node 1.
> Total: 4
> Universe: 4
> Hello World from Node 0.
> Hello World from Node 2.
> Hello World from Node 3.
> Mon Nov 10 13:49:53 CET 2014
> 
> 
> -- Reuti
> 
>> FWIW: the use-all-IP interfaces approach has been in OMPI forever. 
>> 
>> Sent from my phone. No type good. 
>> 
>>> On Nov 10, 2014, at 6:42 AM, Reuti <re...@staff.uni-marburg.de> wrote:
>>> 
>>>> Am 10.11.2014 um 12:24 schrieb Reuti:
>>>> 
>>>> Hi,
>>>> 
>>>>> Am 09.11.2014 um 05:38 schrieb Ralph Castain:
>>>>> 
>>>>> FWIW: during MPI_Init, each process “publishes” all of its interfaces. 
>>>>> Each process receives a complete map of that info for every process in 
>>>>> the job. So when the TCP btl sets itself up, it attempts to connect 
>>>>> across -all- the interfaces published by the other end.
>>>>> 
>>>>> So it doesn’t matter what hostname is provided by the RM. We discover and 
>>>>> “share” all of the interface info for every node, and then use them for 
>>>>> loadbalancing.
>>>> 
>>>> does this lead to any time delay when starting up? I stayed with Open MPI 
>>>> 1.6.5 for some time and tried to use Open MPI 1.8.3 now. As there is a 
>>>> delay when the applications starts in my first compilation of 1.8.3 I 
>>>> disregarded even all my extra options and run it outside of any 
>>>> queuingsystem - the delay remains - on two different clusters.
>>> 
>>> I forgot to mention: the delay is more or less exactly 2 minutes from the 
>>> time I issued `mpiexec` until the `mpihello` starts up (there is no delay 
>>> for the initial `ssh` to reach the other node though).
>>> 
>>> -- Reuti
>>> 
>>> 
>>>> I tracked it down, that up to 1.8.1 it is working fine, but 1.8.2 already 
>>>> creates this delay when starting up a simple mpihello. I assume it may lay 
>>>> in the way how to reach other machines, as with one single machine there 
>>>> is no delay. But using one (and only one - no tree spawn involved) 
>>>> additional machine already triggers this delay.
>>>> 
>>>> Did anyone else notice it?
>>>> 
>>>> -- Reuti
>>>> 
>>>> 
>>>>> HTH
>>>>> Ralph
>>>>> 
>>>>> 
>>>>>> On Nov 8, 2014, at 8:13 PM, Brock Palen <bro...@umich.edu> wrote:
>>>>>> 
>>>>>> Ok I figured, i'm going to have to read some more for my own curiosity. 
>>>>>> The reason I mention the Resource Manager we use, and that the hostnames 
>>>>>> given but PBS/Torque match the 1gig-e interfaces, i'm curious what path 
>>>>>> it would take to get to a peer node when the node list given all match 
>>>>>> the 1gig interfaces but yet data is being sent out the 10gig eoib0/ib0 
>>>>>> interfaces.  
>>>>>> 
>>>>>> I'll go do some measurements and see.
>>>>>> 
>>>>>> Brock Palen
>>>>>> www.umich.edu/~brockp
>>>>>> CAEN Advanced Computing
>>>>>> XSEDE Campus Champion
>>>>>> bro...@umich.edu
>>>>>> (734)936-1985
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>>> On Nov 8, 2014, at 8:30 AM, Jeff Squyres (jsquyres) 
>>>>>>> <jsquy...@cisco.com> wrote:
>>>>>>> 
>>>>>>> Ralph is right: OMPI aggressively uses all Ethernet interfaces by 
>>>>>>> default.  
>>>>>>> 
>>>>>>> This short FAQ has links to 2 other FAQs that provide detailed 
>>>>>>> information about reachability:
>>>>>>> 
>>>>>>> http://www.open-mpi.org/faq/?category=tcp#tcp-multi-network
>>>>>>> 
>>>>>>> The usNIC BTL uses UDP for its wire transport and actually does a much 
>>>>>>> more standards-conformant peer reachability determination (i.e., it 
>>>>>>> actually checks routing tables to see if it can reach a given peer 
>>>>>>> which has all kinds of caching benefits, kernel controls if you want 
>>>>>>> them, etc.).  We haven't back-ported this to the TCP BTL because a) 
>>>>>>> most people who use TCP for MPI still use a single L2 address space, 
>>>>>>> and b) no one has asked for it.  :-)
>>>>>>> 
>>>>>>> As for the round robin scheduling, there's no indication from the Linux 
>>>>>>> TCP stack what the bandwidth is on a given IP interface.  So unless you 
>>>>>>> use the btl_tcp_bandwidth_<IP_INTERFACE_NAME> (e.g., 
>>>>>>> btl_tcp_bandwidth_eth0) MCA params, OMPI will round-robin across them 
>>>>>>> equally.
>>>>>>> 
>>>>>>> If you have multiple IP interfaces sharing a single physical link, 
>>>>>>> there will likely be no benefit from having Open MPI use more than one 
>>>>>>> of them.  You should probably use btl_tcp_if_include / 
>>>>>>> btl_tcp_if_exclude to select just one.
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>>> On Nov 7, 2014, at 2:53 PM, Brock Palen <bro...@umich.edu> wrote:
>>>>>>>> 
>>>>>>>> I was doing a test on our IB based cluster, where I was diabling IB
>>>>>>>> 
>>>>>>>> --mca btl ^openib --mca mtl ^mxm
>>>>>>>> 
>>>>>>>> I was sending very large messages >1GB  and I was surppised by the 
>>>>>>>> speed.
>>>>>>>> 
>>>>>>>> I noticed then that of all our ethernet interfaces
>>>>>>>> 
>>>>>>>> eth0  (1gig-e)
>>>>>>>> ib0  (ip over ib, for lustre configuration at vendor request)
>>>>>>>> eoib0  (ethernet over IB interface for IB -> Ethernet gateway for some 
>>>>>>>> extrnal storage support at >1Gig speed
>>>>>>>> 
>>>>>>>> I saw all three were getting traffic.
>>>>>>>> 
>>>>>>>> We use torque for our Resource Manager and use TM support, the 
>>>>>>>> hostnames given by torque match the eth0 interfaces.
>>>>>>>> 
>>>>>>>> How does OMPI figure out that it can also talk over the others?  How 
>>>>>>>> does it chose to load balance?
>>>>>>>> 
>>>>>>>> BTW that is fine, but we will use if_exclude on one of the IB ones as 
>>>>>>>> ib0 and eoib0  are the same physical device and may screw with load 
>>>>>>>> balancing if anyone ver falls back to TCP.
>>>>>>>> 
>>>>>>>> Brock Palen
>>>>>>>> www.umich.edu/~brockp
>>>>>>>> CAEN Advanced Computing
>>>>>>>> XSEDE Campus Champion
>>>>>>>> bro...@umich.edu
>>>>>>>> (734)936-1985
>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>>> _______________________________________________
>>>>>>>> users mailing list
>>>>>>>> us...@open-mpi.org
>>>>>>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>>>>> Link to this post: 
>>>>>>>> http://www.open-mpi.org/community/lists/users/2014/11/25709.php
>>>>>>> 
>>>>>>> 
>>>>>>> -- 
>>>>>>> Jeff Squyres
>>>>>>> jsquy...@cisco.com
>>>>>>> For corporate legal information go to: 
>>>>>>> http://www.cisco.com/web/about/doing_business/legal/cri/
>>>>>>> 
>>>>>>> _______________________________________________
>>>>>>> users mailing list
>>>>>>> us...@open-mpi.org
>>>>>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>>>> Link to this post: 
>>>>>>> http://www.open-mpi.org/community/lists/users/2014/11/25713.php
>>>>>> 
>>>>>> _______________________________________________
>>>>>> users mailing list
>>>>>> us...@open-mpi.org
>>>>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>>> Link to this post: 
>>>>>> http://www.open-mpi.org/community/lists/users/2014/11/25715.php
>>>>> 
>>>>> _______________________________________________
>>>>> users mailing list
>>>>> us...@open-mpi.org
>>>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>> Link to this post: 
>>>>> http://www.open-mpi.org/community/lists/users/2014/11/25716.php
>>>> 
>>>> _______________________________________________
>>>> users mailing list
>>>> us...@open-mpi.org
>>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>> Link to this post: 
>>>> http://www.open-mpi.org/community/lists/users/2014/11/25721.php
>>> 
>>> _______________________________________________
>>> users mailing list
>>> us...@open-mpi.org
>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
>>> Link to this post: 
>>> http://www.open-mpi.org/community/lists/users/2014/11/25722.php
>> _______________________________________________
>> users mailing list
>> us...@open-mpi.org
>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
>> Link to this post: 
>> http://www.open-mpi.org/community/lists/users/2014/11/25724.php
>> 
> 
> _______________________________________________
> users mailing list
> us...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
> Link to this post: 
> http://www.open-mpi.org/community/lists/users/2014/11/25725.php

Reply via email to