Am 13.11.2014 um 00:55 schrieb Gilles Gouaillardet:

> Could you please send the output of netstat -nr on both head and compute node 
> ?

Head node:

annemarie:~ # netstat -nr
Kernel IP routing table
Destination     Gateway         Genmask         Flags   MSS Window  irtt Iface
0.0.0.0         137.248.x.y 0.0.0.0         UG        0 0          0 eth0
127.0.0.0       0.0.0.0         255.0.0.0       U         0 0          0 lo
137.248.x.0   0.0.0.0         255.255.255.0   U         0 0          0 eth0
169.254.0.0     0.0.0.0         255.255.0.0     U         0 0          0 eth0
192.168.151.80  0.0.0.0         255.255.255.255 UH        0 0          0 eth1
192.168.154.0   0.0.0.0         255.255.255.192 U         0 0          0 eth1
192.168.154.128 0.0.0.0         255.255.255.192 U         0 0          0 eth3

Compute node with (wrong) entry for the non-existing GW:

node28:~ # netstat -nr 
Kernel IP routing table
Destination     Gateway         Genmask         Flags   MSS Window  irtt Iface
0.0.0.0         192.168.154.60  0.0.0.0         UG        0 0          0 eth0
127.0.0.0       0.0.0.0         255.0.0.0       U         0 0          0 lo
169.254.0.0     0.0.0.0         255.255.0.0     U         0 0          0 eth0
192.168.154.0   0.0.0.0         255.255.255.192 U         0 0          0 eth0
192.168.154.64  0.0.0.0         255.255.255.192 U         0 0          0 eth1

As said: when I remove the "default" entry for the GW it starts up instantly.

-- Reti



> no problem obfuscating the ip of the head node, i am only interested in 
> netmasks and routes.
> 
> Ralph Castain <r...@open-mpi.org> wrote:
>> 
>>> On Nov 12, 2014, at 2:45 PM, Reuti <re...@staff.uni-marburg.de> wrote:
>>> 
>>> Am 12.11.2014 um 17:27 schrieb Reuti:
>>> 
>>>> Am 11.11.2014 um 02:25 schrieb Ralph Castain:
>>>> 
>>>>> Another thing you can do is (a) ensure you built with —enable-debug, and 
>>>>> then (b) run it with -mca oob_base_verbose 100  (without the 
>>>>> tcp_if_include option) so we can watch the connection handshake and see 
>>>>> what it is doing. The —hetero-nodes will have not affect here and can be 
>>>>> ignored.
>>>> 
>>>> Done. It really tries to connect to the outside interface of the headnode. 
>>>> But being there a firewall or not: the nodes have no clue how to reach 
>>>> 137.248.0.0 - they have no gateway to this network at all.
>>> 
>>> I have to revert this. They think that there is a gateway although it 
>>> isn't. When I remove the entry by hand for the gateway in the routing table 
>>> it starts up instantly too.
>>> 
>>> While I can do this on my own cluster I still have the 30 seconds delay on 
>>> a cluster where I'm not root, while this can be because of the firewall 
>>> there. The gateway on this cluster is indeed going to the outside world.
>>> 
>>> Personally I find this behavior a little bit too aggressive to use all 
>>> interfaces. If you don't check this carefully beforehand and start a long 
>>> running application one might even not notice the delay during the startup.
>> 
>> Agreed - do you have any suggestions on how we should choose the order in 
>> which to try them? I haven’t been able to come up with anything yet. Jeff 
>> has some fancy algo in his usnic BTL that we are going to discuss after SC 
>> that I’m hoping will help, but I’d be open to doing something better in the 
>> interim for 1.8.4
>> 
>>> 
>>> -- Reuti
>>> 
>>> 
>>>> It tries so independent from the internal or external name of the headnode 
>>>> given in the machinefile - I hit ^C then. I attached the output of Open 
>>>> MPI 1.8.1 for this setup too.
>>>> 
>>>> -- Reuti
>>>> 
>>>> <openmpi1.8.3.txt><openmpi1.8.1.txt>_______________________________________________
>>>> users mailing list
>>>> us...@open-mpi.org
>>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>> Link to this post: 
>>>> http://www.open-mpi.org/community/lists/users/2014/11/25777.php
>>> 
>>> _______________________________________________
>>> users mailing list
>>> us...@open-mpi.org
>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
>>> Link to this post: 
>>> http://www.open-mpi.org/community/lists/users/2014/11/25781.php
>> 
>> _______________________________________________
>> users mailing list
>> us...@open-mpi.org
>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
>> Link to this post: 
>> http://www.open-mpi.org/community/lists/users/2014/11/25782.php
> _______________________________________________
> users mailing list
> us...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
> Link to this post: 
> http://www.open-mpi.org/community/lists/users/2014/11/25783.php
> 

Reply via email to