Could you please send the output of netstat -nr on both head and compute node ? no problem obfuscating the ip of the head node, i am only interested in netmasks and routes.
Ralph Castain <r...@open-mpi.org> wrote: > >> On Nov 12, 2014, at 2:45 PM, Reuti <re...@staff.uni-marburg.de> wrote: >> >> Am 12.11.2014 um 17:27 schrieb Reuti: >> >>> Am 11.11.2014 um 02:25 schrieb Ralph Castain: >>> >>>> Another thing you can do is (a) ensure you built with —enable-debug, and >>>> then (b) run it with -mca oob_base_verbose 100 (without the >>>> tcp_if_include option) so we can watch the connection handshake and see >>>> what it is doing. The —hetero-nodes will have not affect here and can be >>>> ignored. >>> >>> Done. It really tries to connect to the outside interface of the headnode. >>> But being there a firewall or not: the nodes have no clue how to reach >>> 137.248.0.0 - they have no gateway to this network at all. >> >> I have to revert this. They think that there is a gateway although it isn't. >> When I remove the entry by hand for the gateway in the routing table it >> starts up instantly too. >> >> While I can do this on my own cluster I still have the 30 seconds delay on a >> cluster where I'm not root, while this can be because of the firewall there. >> The gateway on this cluster is indeed going to the outside world. >> >> Personally I find this behavior a little bit too aggressive to use all >> interfaces. If you don't check this carefully beforehand and start a long >> running application one might even not notice the delay during the startup. > >Agreed - do you have any suggestions on how we should choose the order in >which to try them? I haven’t been able to come up with anything yet. Jeff has >some fancy algo in his usnic BTL that we are going to discuss after SC that >I’m hoping will help, but I’d be open to doing something better in the interim >for 1.8.4 > >> >> -- Reuti >> >> >>> It tries so independent from the internal or external name of the headnode >>> given in the machinefile - I hit ^C then. I attached the output of Open MPI >>> 1.8.1 for this setup too. >>> >>> -- Reuti >>> >>> <openmpi1.8.3.txt><openmpi1.8.1.txt>_______________________________________________ >>> users mailing list >>> us...@open-mpi.org >>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users >>> Link to this post: >>> http://www.open-mpi.org/community/lists/users/2014/11/25777.php >> >> _______________________________________________ >> users mailing list >> us...@open-mpi.org >> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users >> Link to this post: >> http://www.open-mpi.org/community/lists/users/2014/11/25781.php > >_______________________________________________ >users mailing list >us...@open-mpi.org >Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users >Link to this post: >http://www.open-mpi.org/community/lists/users/2014/11/25782.php