That is indeed bizarre - we haven’t heard of anything similar from other users. What is your network configuration? If you use oob_tcp_if_include or exclude, can you resolve the problem?
> On Nov 10, 2014, at 4:50 AM, Reuti <re...@staff.uni-marburg.de> wrote: > > Am 10.11.2014 um 12:50 schrieb Jeff Squyres (jsquyres): > >> Wow, that's pretty terrible! :( >> >> Is the behavior BTL-specific, perchance? E.G., if you only use certain >> BTLs, does the delay disappear? > > You mean something like: > > reuti@annemarie:~> date; mpiexec -mca btl self,tcp -n 4 --hostfile machines > ./mpihello; date > Mon Nov 10 13:44:34 CET 2014 > Hello World from Node 1. > Total: 4 > Universe: 4 > Hello World from Node 0. > Hello World from Node 3. > Hello World from Node 2. > Mon Nov 10 13:46:42 CET 2014 > > (the above was even the latest v1.8.3-186-g978f61d) > > Falling back to 1.8.1 gives (as expected): > > reuti@annemarie:~> date; mpiexec -mca btl self,tcp -n 4 --hostfile machines > ./mpihello; date > Mon Nov 10 13:49:51 CET 2014 > Hello World from Node 1. > Total: 4 > Universe: 4 > Hello World from Node 0. > Hello World from Node 2. > Hello World from Node 3. > Mon Nov 10 13:49:53 CET 2014 > > > -- Reuti > >> FWIW: the use-all-IP interfaces approach has been in OMPI forever. >> >> Sent from my phone. No type good. >> >>> On Nov 10, 2014, at 6:42 AM, Reuti <re...@staff.uni-marburg.de> wrote: >>> >>>> Am 10.11.2014 um 12:24 schrieb Reuti: >>>> >>>> Hi, >>>> >>>>> Am 09.11.2014 um 05:38 schrieb Ralph Castain: >>>>> >>>>> FWIW: during MPI_Init, each process “publishes” all of its interfaces. >>>>> Each process receives a complete map of that info for every process in >>>>> the job. So when the TCP btl sets itself up, it attempts to connect >>>>> across -all- the interfaces published by the other end. >>>>> >>>>> So it doesn’t matter what hostname is provided by the RM. We discover and >>>>> “share” all of the interface info for every node, and then use them for >>>>> loadbalancing. >>>> >>>> does this lead to any time delay when starting up? I stayed with Open MPI >>>> 1.6.5 for some time and tried to use Open MPI 1.8.3 now. As there is a >>>> delay when the applications starts in my first compilation of 1.8.3 I >>>> disregarded even all my extra options and run it outside of any >>>> queuingsystem - the delay remains - on two different clusters. >>> >>> I forgot to mention: the delay is more or less exactly 2 minutes from the >>> time I issued `mpiexec` until the `mpihello` starts up (there is no delay >>> for the initial `ssh` to reach the other node though). >>> >>> -- Reuti >>> >>> >>>> I tracked it down, that up to 1.8.1 it is working fine, but 1.8.2 already >>>> creates this delay when starting up a simple mpihello. I assume it may lay >>>> in the way how to reach other machines, as with one single machine there >>>> is no delay. But using one (and only one - no tree spawn involved) >>>> additional machine already triggers this delay. >>>> >>>> Did anyone else notice it? >>>> >>>> -- Reuti >>>> >>>> >>>>> HTH >>>>> Ralph >>>>> >>>>> >>>>>> On Nov 8, 2014, at 8:13 PM, Brock Palen <bro...@umich.edu> wrote: >>>>>> >>>>>> Ok I figured, i'm going to have to read some more for my own curiosity. >>>>>> The reason I mention the Resource Manager we use, and that the hostnames >>>>>> given but PBS/Torque match the 1gig-e interfaces, i'm curious what path >>>>>> it would take to get to a peer node when the node list given all match >>>>>> the 1gig interfaces but yet data is being sent out the 10gig eoib0/ib0 >>>>>> interfaces. >>>>>> >>>>>> I'll go do some measurements and see. >>>>>> >>>>>> Brock Palen >>>>>> www.umich.edu/~brockp >>>>>> CAEN Advanced Computing >>>>>> XSEDE Campus Champion >>>>>> bro...@umich.edu >>>>>> (734)936-1985 >>>>>> >>>>>> >>>>>> >>>>>>> On Nov 8, 2014, at 8:30 AM, Jeff Squyres (jsquyres) >>>>>>> <jsquy...@cisco.com> wrote: >>>>>>> >>>>>>> Ralph is right: OMPI aggressively uses all Ethernet interfaces by >>>>>>> default. >>>>>>> >>>>>>> This short FAQ has links to 2 other FAQs that provide detailed >>>>>>> information about reachability: >>>>>>> >>>>>>> http://www.open-mpi.org/faq/?category=tcp#tcp-multi-network >>>>>>> >>>>>>> The usNIC BTL uses UDP for its wire transport and actually does a much >>>>>>> more standards-conformant peer reachability determination (i.e., it >>>>>>> actually checks routing tables to see if it can reach a given peer >>>>>>> which has all kinds of caching benefits, kernel controls if you want >>>>>>> them, etc.). We haven't back-ported this to the TCP BTL because a) >>>>>>> most people who use TCP for MPI still use a single L2 address space, >>>>>>> and b) no one has asked for it. :-) >>>>>>> >>>>>>> As for the round robin scheduling, there's no indication from the Linux >>>>>>> TCP stack what the bandwidth is on a given IP interface. So unless you >>>>>>> use the btl_tcp_bandwidth_<IP_INTERFACE_NAME> (e.g., >>>>>>> btl_tcp_bandwidth_eth0) MCA params, OMPI will round-robin across them >>>>>>> equally. >>>>>>> >>>>>>> If you have multiple IP interfaces sharing a single physical link, >>>>>>> there will likely be no benefit from having Open MPI use more than one >>>>>>> of them. You should probably use btl_tcp_if_include / >>>>>>> btl_tcp_if_exclude to select just one. >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>>> On Nov 7, 2014, at 2:53 PM, Brock Palen <bro...@umich.edu> wrote: >>>>>>>> >>>>>>>> I was doing a test on our IB based cluster, where I was diabling IB >>>>>>>> >>>>>>>> --mca btl ^openib --mca mtl ^mxm >>>>>>>> >>>>>>>> I was sending very large messages >1GB and I was surppised by the >>>>>>>> speed. >>>>>>>> >>>>>>>> I noticed then that of all our ethernet interfaces >>>>>>>> >>>>>>>> eth0 (1gig-e) >>>>>>>> ib0 (ip over ib, for lustre configuration at vendor request) >>>>>>>> eoib0 (ethernet over IB interface for IB -> Ethernet gateway for some >>>>>>>> extrnal storage support at >1Gig speed >>>>>>>> >>>>>>>> I saw all three were getting traffic. >>>>>>>> >>>>>>>> We use torque for our Resource Manager and use TM support, the >>>>>>>> hostnames given by torque match the eth0 interfaces. >>>>>>>> >>>>>>>> How does OMPI figure out that it can also talk over the others? How >>>>>>>> does it chose to load balance? >>>>>>>> >>>>>>>> BTW that is fine, but we will use if_exclude on one of the IB ones as >>>>>>>> ib0 and eoib0 are the same physical device and may screw with load >>>>>>>> balancing if anyone ver falls back to TCP. >>>>>>>> >>>>>>>> Brock Palen >>>>>>>> www.umich.edu/~brockp >>>>>>>> CAEN Advanced Computing >>>>>>>> XSEDE Campus Champion >>>>>>>> bro...@umich.edu >>>>>>>> (734)936-1985 >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> _______________________________________________ >>>>>>>> users mailing list >>>>>>>> us...@open-mpi.org >>>>>>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users >>>>>>>> Link to this post: >>>>>>>> http://www.open-mpi.org/community/lists/users/2014/11/25709.php >>>>>>> >>>>>>> >>>>>>> -- >>>>>>> Jeff Squyres >>>>>>> jsquy...@cisco.com >>>>>>> For corporate legal information go to: >>>>>>> http://www.cisco.com/web/about/doing_business/legal/cri/ >>>>>>> >>>>>>> _______________________________________________ >>>>>>> users mailing list >>>>>>> us...@open-mpi.org >>>>>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users >>>>>>> Link to this post: >>>>>>> http://www.open-mpi.org/community/lists/users/2014/11/25713.php >>>>>> >>>>>> _______________________________________________ >>>>>> users mailing list >>>>>> us...@open-mpi.org >>>>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users >>>>>> Link to this post: >>>>>> http://www.open-mpi.org/community/lists/users/2014/11/25715.php >>>>> >>>>> _______________________________________________ >>>>> users mailing list >>>>> us...@open-mpi.org >>>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users >>>>> Link to this post: >>>>> http://www.open-mpi.org/community/lists/users/2014/11/25716.php >>>> >>>> _______________________________________________ >>>> users mailing list >>>> us...@open-mpi.org >>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users >>>> Link to this post: >>>> http://www.open-mpi.org/community/lists/users/2014/11/25721.php >>> >>> _______________________________________________ >>> users mailing list >>> us...@open-mpi.org >>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users >>> Link to this post: >>> http://www.open-mpi.org/community/lists/users/2014/11/25722.php >> _______________________________________________ >> users mailing list >> us...@open-mpi.org >> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users >> Link to this post: >> http://www.open-mpi.org/community/lists/users/2014/11/25724.php >> > > _______________________________________________ > users mailing list > us...@open-mpi.org > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users > Link to this post: > http://www.open-mpi.org/community/lists/users/2014/11/25725.php