Re: [OMPI users] How OMPI picks ethernet interfaces

2014-11-14 Thread Jeff Squyres (jsquyres)
On Nov 14, 2014, at 10:52 AM, Reuti wrote: > I appreciate your replies and will read them thoroughly. I think it's best to > continue with the discussion after SC14. I don't want to put any burden on > anyone when time is tight. Cool; many thanks. This is complicated stuff; we might not have

Re: [OMPI users] How OMPI picks ethernet interfaces

2014-11-14 Thread Reuti
Jeff, Gus, Gilles, Am 14.11.2014 um 15:56 schrieb Jeff Squyres (jsquyres): > I lurked on this thread for a while, but I have some thoughts on the many > issues that were discussed on this thread (sorry, I'm still pretty under > water trying to get ready for SC next week...). I appreciate your

Re: [OMPI users] How OMPI picks ethernet interfaces

2014-11-14 Thread Jeff Squyres (jsquyres)
I lurked on this thread for a while, but I have some thoughts on the many issues that were discussed on this thread (sorry, I'm still pretty under water trying to get ready for SC next week...). These points are in no particular order... 0. Two fundamental points have been missed in this threa

Re: [OMPI users] How OMPI picks ethernet interfaces

2014-11-13 Thread Gilles Gouaillardet
My 0.02 US$ first, the root cause of the problem was a default gateway was configured on the node, but this gateway was unreachable. imho, this is incorrect system setting that can lead to unpredictable results : - openmpi 1.8.1 works (you are lucky, good for you) - openmpi 1.8.3 fails (no luck th

Re: [OMPI users] How OMPI picks ethernet interfaces

2014-11-13 Thread Gus Correa
On 11/13/2014 11:14 AM, Ralph Castain wrote: Hmmm…I’m beginning to grok the issue. It is a tad unusual for people to assign different hostnames to their interfaces - I’ve seen it in the Hadoop world, but not in HPC. Still, no law against it. No, not so unusual. I have clusters from respectable

Re: [OMPI users] How OMPI picks ethernet interfaces

2014-11-13 Thread Gus Correa
Hi Reuti See below, please. On 11/13/2014 07:19 AM, Reuti wrote: Gus, Am 13.11.2014 um 02:59 schrieb Gus Correa: On 11/12/2014 05:45 PM, Reuti wrote: Am 12.11.2014 um 17:27 schrieb Reuti: Am 11.11.2014 um 02:25 schrieb Ralph Castain: Another thing you can do is (a) ensure you built with

Re: [OMPI users] How OMPI picks ethernet interfaces

2014-11-13 Thread Ralph Castain
> On Nov 13, 2014, at 9:20 AM, Reuti wrote: > > Am 13.11.2014 um 17:14 schrieb Ralph Castain: > >> Hmmm…I’m beginning to grok the issue. It is a tad unusual for people to >> assign different hostnames to their interfaces - I’ve seen it in the Hadoop >> world, but not in HPC. Still, no law aga

Re: [OMPI users] How OMPI picks ethernet interfaces

2014-11-13 Thread Reuti
Am 13.11.2014 um 17:14 schrieb Ralph Castain: > Hmmm…I’m beginning to grok the issue. It is a tad unusual for people to > assign different hostnames to their interfaces - I’ve seen it in the Hadoop > world, but not in HPC. Still, no law against it. Maybe it depends on the background to do it th

Re: [OMPI users] How OMPI picks ethernet interfaces

2014-11-13 Thread Ralph Castain
Hmmm…I’m beginning to grok the issue. It is a tad unusual for people to assign different hostnames to their interfaces - I’ve seen it in the Hadoop world, but not in HPC. Still, no law against it. This will take a little thought to figure out a solution. One problem that immediately occurs is i

Re: [OMPI users] How OMPI picks ethernet interfaces

2014-11-13 Thread Reuti
Am 13.11.2014 um 00:34 schrieb Ralph Castain: >> On Nov 12, 2014, at 2:45 PM, Reuti wrote: >> >> Am 12.11.2014 um 17:27 schrieb Reuti: >> >>> Am 11.11.2014 um 02:25 schrieb Ralph Castain: >>> Another thing you can do is (a) ensure you built with —enable-debug, and then (b) run it wi

Re: [OMPI users] How OMPI picks ethernet interfaces

2014-11-13 Thread Reuti
Gus, Am 13.11.2014 um 02:59 schrieb Gus Correa: > On 11/12/2014 05:45 PM, Reuti wrote: >> Am 12.11.2014 um 17:27 schrieb Reuti: >> >>> Am 11.11.2014 um 02:25 schrieb Ralph Castain: >>> Another thing you can do is (a) ensure you built with —enable-debug, >> and then (b) run it with -mca oob

Re: [OMPI users] How OMPI picks ethernet interfaces

2014-11-12 Thread Gus Correa
On 11/12/2014 05:45 PM, Reuti wrote: Am 12.11.2014 um 17:27 schrieb Reuti: Am 11.11.2014 um 02:25 schrieb Ralph Castain: Another thing you can do is (a) ensure you built with —enable-debug, and then (b) run it with -mca oob_base_verbose 100 (without the tcp_if_include option) so we can watch

Re: [OMPI users] How OMPI picks ethernet interfaces

2014-11-12 Thread Ralph Castain
> On Nov 12, 2014, at 2:45 PM, Reuti wrote: > > Am 12.11.2014 um 17:27 schrieb Reuti: > >> Am 11.11.2014 um 02:25 schrieb Ralph Castain: >> >>> Another thing you can do is (a) ensure you built with —enable-debug, and >>> then (b) run it with -mca oob_base_verbose 100 (without the tcp_if_incl

Re: [OMPI users] How OMPI picks ethernet interfaces

2014-11-12 Thread Reuti
Am 12.11.2014 um 17:27 schrieb Reuti: > Am 11.11.2014 um 02:25 schrieb Ralph Castain: > >> Another thing you can do is (a) ensure you built with —enable-debug, and >> then (b) run it with -mca oob_base_verbose 100 (without the tcp_if_include >> option) so we can watch the connection handshake

Re: [OMPI users] How OMPI picks ethernet interfaces

2014-11-12 Thread Reuti
Am 11.11.2014 um 02:25 schrieb Ralph Castain: > Another thing you can do is (a) ensure you built with —enable-debug, and then > (b) run it with -mca oob_base_verbose 100 (without the tcp_if_include > option) so we can watch the connection handshake and see what it is doing. > The —hetero-nodes

Re: [OMPI users] How OMPI picks ethernet interfaces

2014-11-12 Thread Reuti
Am 11.11.2014 um 02:12 schrieb Gilles Gouaillardet: > Hi, > > IIRC there were some bug fixes between 1.8.1 and 1.8.2 in order to really use > all the published interfaces. > > by any change, are you running a firewall on your head node ? Yes, but only for the interface to the outside world. N

Re: [OMPI users] How OMPI picks ethernet interfaces

2014-11-10 Thread Ralph Castain
Another thing you can do is (a) ensure you built with —enable-debug, and then (b) run it with -mca oob_base_verbose 100 (without the tcp_if_include option) so we can watch the connection handshake and see what it is doing. The —hetero-nodes will have not affect here and can be ignored. Ralph

Re: [OMPI users] How OMPI picks ethernet interfaces

2014-11-10 Thread Gilles Gouaillardet
Hi, IIRC there were some bug fixes between 1.8.1 and 1.8.2 in order to really use all the published interfaces. by any change, are you running a firewall on your head node ? one possible explanation is the compute node tries to access the public interface of the head node, and packets get dropped

Re: [OMPI users] How OMPI picks ethernet interfaces

2014-11-10 Thread Reuti
Hi, Am 10.11.2014 um 16:39 schrieb Ralph Castain: > That is indeed bizarre - we haven’t heard of anything similar from other > users. What is your network configuration? If you use oob_tcp_if_include or > exclude, can you resolve the problem? Thx - this option helped to get it working. These

Re: [OMPI users] How OMPI picks ethernet interfaces

2014-11-10 Thread Ralph Castain
That is indeed bizarre - we haven’t heard of anything similar from other users. What is your network configuration? If you use oob_tcp_if_include or exclude, can you resolve the problem? > On Nov 10, 2014, at 4:50 AM, Reuti wrote: > > Am 10.11.2014 um 12:50 schrieb Jeff Squyres (jsquyres): >

Re: [OMPI users] How OMPI picks ethernet interfaces

2014-11-10 Thread Reuti
Am 10.11.2014 um 12:50 schrieb Jeff Squyres (jsquyres): > Wow, that's pretty terrible! :( > > Is the behavior BTL-specific, perchance? E.G., if you only use certain BTLs, > does the delay disappear? You mean something like: reuti@annemarie:~> date; mpiexec -mca btl self,tcp -n 4 --hostfile m

Re: [OMPI users] How OMPI picks ethernet interfaces

2014-11-10 Thread Jeff Squyres (jsquyres)
Wow, that's pretty terrible! :( Is the behavior BTL-specific, perchance? E.G., if you only use certain BTLs, does the delay disappear? FWIW: the use-all-IP interfaces approach has been in OMPI forever. Sent from my phone. No type good. > On Nov 10, 2014, at 6:42 AM, Reuti wrote: > >> Am

Re: [OMPI users] How OMPI picks ethernet interfaces

2014-11-10 Thread Reuti
Am 10.11.2014 um 12:24 schrieb Reuti: > Hi, > > Am 09.11.2014 um 05:38 schrieb Ralph Castain: > >> FWIW: during MPI_Init, each process “publishes” all of its interfaces. Each >> process receives a complete map of that info for every process in the job. >> So when the TCP btl sets itself up, it

Re: [OMPI users] How OMPI picks ethernet interfaces

2014-11-10 Thread Reuti
Hi, Am 09.11.2014 um 05:38 schrieb Ralph Castain: > FWIW: during MPI_Init, each process “publishes” all of its interfaces. Each > process receives a complete map of that info for every process in the job. So > when the TCP btl sets itself up, it attempts to connect across -all- the > interface

Re: [OMPI users] How OMPI picks ethernet interfaces

2014-11-08 Thread Ralph Castain
FWIW: during MPI_Init, each process “publishes” all of its interfaces. Each process receives a complete map of that info for every process in the job. So when the TCP btl sets itself up, it attempts to connect across -all- the interfaces published by the other end. So it doesn’t matter what hos

Re: [OMPI users] How OMPI picks ethernet interfaces

2014-11-08 Thread Brock Palen
Ok I figured, i'm going to have to read some more for my own curiosity. The reason I mention the Resource Manager we use, and that the hostnames given but PBS/Torque match the 1gig-e interfaces, i'm curious what path it would take to get to a peer node when the node list given all match the 1gig

Re: [OMPI users] How OMPI picks ethernet interfaces

2014-11-08 Thread Jeff Squyres (jsquyres)
Ralph is right: OMPI aggressively uses all Ethernet interfaces by default. This short FAQ has links to 2 other FAQs that provide detailed information about reachability: http://www.open-mpi.org/faq/?category=tcp#tcp-multi-network The usNIC BTL uses UDP for its wire transport and actually

Re: [OMPI users] How OMPI picks ethernet interfaces

2014-11-07 Thread Ralph Castain
OMPI discovers all active interfaces and automatically considers them available for its use unless instructed otherwise via the params. I’d have to look at the TCP BTL code to see the loadbalancing algo - I thought we didn’t have that “on” by default across BTLs, but I don’t know if the TCP one