yes, i know - it is cmr'd

On Aug 20, 2014, at 10:26 AM, Mike Dubman <mi...@dev.mellanox.co.il> wrote:

> btw, we get same error in v1.8 branch as well.
> 
> 
> On Wed, Aug 20, 2014 at 8:06 PM, Ralph Castain <r...@open-mpi.org> wrote:
> It was not yet fixed - but should be now.
> 
> On Aug 20, 2014, at 6:39 AM, Timur Ismagilov <tismagi...@mail.ru> wrote:
> 
>> Hello!
>> 
>> As i can see, the bug is fixed, but in Open MPI v1.9a1r32516  i still have 
>> the problem
>> 
>> a)
>> $ mpirun  -np 1 ./hello_c
>> 
>> --------------------------------------------------------------------------
>> An ORTE daemon has unexpectedly failed after launch and before
>> communicating back to mpirun. This could be caused by a number
>> of factors, including an inability to create a connection back
>> to mpirun due to a lack of common network interfaces and/or no
>> route found between them. Please check network connectivity
>> (including firewalls and network routing requirements).
>> --------------------------------------------------------------------------
>> 
>> b)
>> $ mpirun --mca oob_tcp_if_include ib0 -np 1 ./hello_c
>> --------------------------------------------------------------------------
>> An ORTE daemon has unexpectedly failed after launch and before
>> communicating back to mpirun. This could be caused by a number
>> of factors, including an inability to create a connection back
>> to mpirun due to a lack of common network interfaces and/or no
>> route found between them. Please check network connectivity
>> (including firewalls and network routing requirements).
>> --------------------------------------------------------------------------
>> 
>> c)
>> 
>> $ mpirun --mca oob_tcp_if_include ib0 -debug-daemons --mca plm_base_verbose 
>> 5 -mca oob_base_verbose 10 -mca rml_base_verbose 10 -np 1 ./hello_c
>> 
>> [compiler-2:14673] mca:base:select:( plm) Querying component [isolated]
>> [compiler-2:14673] mca:base:select:( plm) Query of component [isolated] set 
>> priority to 0
>> [compiler-2:14673] mca:base:select:( plm) Querying component [rsh]
>> [compiler-2:14673] mca:base:select:( plm) Query of component [rsh] set 
>> priority to 10
>> [compiler-2:14673] mca:base:select:( plm) Querying component [slurm]
>> [compiler-2:14673] mca:base:select:( plm) Query of component [slurm] set 
>> priority to 75
>> [compiler-2:14673] mca:base:select:( plm) Selected component [slurm]
>> [compiler-2:14673] mca: base: components_register: registering oob components
>> [compiler-2:14673] mca: base: components_register: found loaded component tcp
>> [compiler-2:14673] mca: base: components_register: component tcp register 
>> function successful
>> [compiler-2:14673] mca: base: components_open: opening oob components
>> [compiler-2:14673] mca: base: components_open: found loaded component tcp
>> [compiler-2:14673] mca: base: components_open: component tcp open function 
>> successful
>> [compiler-2:14673] mca:oob:select: checking available component tcp
>> [compiler-2:14673] mca:oob:select: Querying component [tcp]
>> [compiler-2:14673] oob:tcp: component_available called
>> [compiler-2:14673] WORKING INTERFACE 1 KERNEL INDEX 1 FAMILY: V4
>> [compiler-2:14673] WORKING INTERFACE 2 KERNEL INDEX 3 FAMILY: V4
>> [compiler-2:14673] WORKING INTERFACE 3 KERNEL INDEX 4 FAMILY: V4
>> [compiler-2:14673] WORKING INTERFACE 4 KERNEL INDEX 5 FAMILY: V4
>> [compiler-2:14673] WORKING INTERFACE 5 KERNEL INDEX 6 FAMILY: V4
>> [compiler-2:14673] [[49095,0],0] oob:tcp:init adding 10.128.0.4 to our list 
>> of V4 connections
>> [compiler-2:14673] WORKING INTERFACE 6 KERNEL INDEX 7 FAMILY: V4
>> [compiler-2:14673] [[49095,0],0] TCP STARTUP
>> [compiler-2:14673] [[49095,0],0] attempting to bind to IPv4 port 0
>> [compiler-2:14673] [[49095,0],0] assigned IPv4 port 59460
>> [compiler-2:14673] mca:oob:select: Adding component to end
>> [compiler-2:14673] mca:oob:select: Found 1 active transports
>> [compiler-2:14673] mca: base: components_register: registering rml components
>> [compiler-2:14673] mca: base: components_register: found loaded component oob
>> [compiler-2:14673] mca: base: components_register: component oob has no 
>> register or open function
>> [compiler-2:14673] mca: base: components_open: opening rml components
>> [compiler-2:14673] mca: base: components_open: found loaded component oob
>> [compiler-2:14673] mca: base: components_open: component oob open function 
>> successful
>> [compiler-2:14673] orte_rml_base_select: initializing rml component oob
>> [compiler-2:14673] [[49095,0],0] posting recv
>> [compiler-2:14673] [[49095,0],0] posting persistent recv on tag 30 for peer 
>> [[WILDCARD],WILDCARD]
>> [compiler-2:14673] [[49095,0],0] posting recv
>> [compiler-2:14673] [[49095,0],0] posting persistent recv on tag 15 for peer 
>> [[WILDCARD],WILDCARD]
>> [compiler-2:14673] [[49095,0],0] posting recv
>> [compiler-2:14673] [[49095,0],0] posting persistent recv on tag 32 for peer 
>> [[WILDCARD],WILDCARD]
>> [compiler-2:14673] [[49095,0],0] posting recv
>> [compiler-2:14673] [[49095,0],0] posting persistent recv on tag 33 for peer 
>> [[WILDCARD],WILDCARD]
>> [compiler-2:14673] [[49095,0],0] posting recv
>> [compiler-2:14673] [[49095,0],0] posting persistent recv on tag 5 for peer 
>> [[WILDCARD],WILDCARD]
>> [compiler-2:14673] [[49095,0],0] posting recv
>> [compiler-2:14673] [[49095,0],0] posting persistent recv on tag 10 for peer 
>> [[WILDCARD],WILDCARD]
>> [compiler-2:14673] [[49095,0],0] posting recv
>> [compiler-2:14673] [[49095,0],0] posting persistent recv on tag 12 for peer 
>> [[WILDCARD],WILDCARD]
>> [compiler-2:14673] [[49095,0],0] posting recv
>> [compiler-2:14673] [[49095,0],0] posting persistent recv on tag 9 for peer 
>> [[WILDCARD],WILDCARD]
>> [compiler-2:14673] [[49095,0],0] posting recv
>> [compiler-2:14673] [[49095,0],0] posting persistent recv on tag 34 for peer 
>> [[WILDCARD],WILDCARD]
>> [compiler-2:14673] [[49095,0],0] posting recv
>> [compiler-2:14673] [[49095,0],0] posting persistent recv on tag 2 for peer 
>> [[WILDCARD],WILDCARD]
>> [compiler-2:14673] [[49095,0],0] posting recv
>> [compiler-2:14673] [[49095,0],0] posting persistent recv on tag 21 for peer 
>> [[WILDCARD],WILDCARD]
>> [compiler-2:14673] [[49095,0],0] posting recv
>> [compiler-2:14673] [[49095,0],0] posting persistent recv on tag 22 for peer 
>> [[WILDCARD],WILDCARD]
>> [compiler-2:14673] [[49095,0],0] posting recv
>> [compiler-2:14673] [[49095,0],0] posting persistent recv on tag 45 for peer 
>> [[WILDCARD],WILDCARD]
>> [compiler-2:14673] [[49095,0],0] posting recv
>> [compiler-2:14673] [[49095,0],0] posting persistent recv on tag 46 for peer 
>> [[WILDCARD],WILDCARD]
>> [compiler-2:14673] [[49095,0],0] posting recv
>> [compiler-2:14673] [[49095,0],0] posting persistent recv on tag 1 for peer 
>> [[WILDCARD],WILDCARD]
>> [compiler-2:14673] [[49095,0],0] posting recv
>> [compiler-2:14673] [[49095,0],0] posting persistent recv on tag 27 for peer 
>> [[WILDCARD],WILDCARD]
>> Daemon was launched on node1-128-01 - beginning to initialize
>> --------------------------------------------------------------------------
>> WARNING: An invalid value was given for oob_tcp_if_include. This
>> value will be ignored.
>> 
>> Local host: node1-128-01
>> Value: "ib0"
>> Message: Invalid specification (missing "/")
>> --------------------------------------------------------------------------
>> --------------------------------------------------------------------------
>> None of the TCP networks specified to be included for out-of-band 
>> communications
>> could be found:
>> 
>> Value given:
>> 
>> Please revise the specification and try again.
>> --------------------------------------------------------------------------
>> --------------------------------------------------------------------------
>> No network interfaces were found for out-of-band communications. We require
>> at least one available network for out-of-band messaging.
>> --------------------------------------------------------------------------
>> --------------------------------------------------------------------------
>> It looks like orte_init failed for some reason; your parallel process is
>> likely to abort. There are many reasons that a parallel process can
>> fail during orte_init; some of which are due to configuration or
>> environment problems. This failure appears to be an internal failure;
>> here's some additional information (which may only be relevant to an
>> Open MPI developer):
>> 
>> orte_oob_base_select failed
>> --> Returned value (null) (-43) instead of ORTE_SUCCESS
>> --------------------------------------------------------------------------
>> srun: error: node1-128-01: task 0: Exited with exit code 213
>> srun: Terminating job step 661215.0
>> --------------------------------------------------------------------------
>> An ORTE daemon has unexpectedly failed after launch and before
>> communicating back to mpirun. This could be caused by a number
>> of factors, including an inability to create a connection back
>> to mpirun due to a lack of common network interfaces and/or no
>> route found between them. Please check network connectivity
>> (including firewalls and network routing requirements).
>> --------------------------------------------------------------------------
>> [compiler-2:14673] [[49095,0],0] orted_cmd: received halt_vm cmd
>> [compiler-2:14673] mca: base: close: component oob closed
>> [compiler-2:14673] mca: base: close: unloading component oob
>> [compiler-2:14673] [[49095,0],0] TCP SHUTDOWN
>> [compiler-2:14673] mca: base: close: component tcp closed
>> [compiler-2:14673] mca: base: close: unloading component tcp
>> 
>> 
>> 
>> 
>> Tue, 12 Aug 2014 18:33:24 +0000 от "Jeff Squyres (jsquyres)" 
>> <jsquy...@cisco.com>:
>> I filed the following ticket:
>> 
>>     https://svn.open-mpi.org/trac/ompi/ticket/4857
>> 
>> 
>> On Aug 12, 2014, at 12:39 PM, Jeff Squyres (jsquyres) <jsquy...@cisco.com> 
>> wrote:
>> 
>> > (please keep the users list CC'ed)
>> > 
>> > We talked about this on the weekly engineering call today. Ralph has an 
>> > idea what is happening -- I need to do a little investigation today and 
>> > file a bug. I'll make sure you're CC'ed on the bug ticket.
>> > 
>> > 
>> > 
>> > On Aug 12, 2014, at 12:27 PM, Timur Ismagilov <tismagi...@mail.ru> wrote:
>> > 
>> >> I don't have this error in OMPI 1.9a1r32252 and OMPI 1.8.1 (with --mca 
>> >> oob_tcp_if_include ib0), but in all latest night snapshots i got this 
>> >> error.
>> >> 
>> >> 
>> >> Tue, 12 Aug 2014 13:08:12 +0000 от "Jeff Squyres (jsquyres)" 
>> >> <jsquy...@cisco.com>:
>> >> Are you running any kind of firewall on the node where mpirun is invoked? 
>> >> Open MPI needs to be able to use arbitrary TCP ports between the servers 
>> >> on which it runs.
>> >> 
>> >> This second mail seems to imply a bug in OMPI's oob_tcp_if_include param 
>> >> handling, however -- it's supposed to be able to handle an interface name 
>> >> (not just a network specification).
>> >> 
>> >> Ralph -- can you have a look?
>> >> 
>> >> 
>> >> On Aug 12, 2014, at 8:41 AM, Timur Ismagilov <tismagi...@mail.ru> wrote:
>> >> 
>> >>> When i add --mca oob_tcp_if_include ib0 (infiniband interface) to mpirun 
>> >>> (as it was here: 
>> >>> http://www.open-mpi.org/community/lists/users/2014/07/24857.php ) i got 
>> >>> this output:
>> >>> 
>> >>> [compiler-2:08792] mca:base:select:( plm) Querying component [isolated]
>> >>> [compiler-2:08792] mca:base:select:( plm) Query of component [isolated] 
>> >>> set priority to 0
>> >>> [compiler-2:08792] mca:base:select:( plm) Querying component [rsh]
>> >>> [compiler-2:08792] mca:base:select:( plm) Query of component [rsh] set 
>> >>> priority to 10
>> >>> [compiler-2:08792] mca:base:select:( plm) Querying component [slurm]
>> >>> [compiler-2:08792] mca:base:select:( plm) Query of component [slurm] set 
>> >>> priority to 75
>> >>> [compiler-2:08792] mca:base:select:( plm) Selected component [slurm]
>> >>> [compiler-2:08792] mca: base: components_register: registering oob 
>> >>> components
>> >>> [compiler-2:08792] mca: base: components_register: found loaded 
>> >>> component tcp
>> >>> [compiler-2:08792] mca: base: components_register: component tcp 
>> >>> register function successful
>> >>> [compiler-2:08792] mca: base: components_open: opening oob components
>> >>> [compiler-2:08792] mca: base: components_open: found loaded component tcp
>> >>> [compiler-2:08792] mca: base: components_open: component tcp open 
>> >>> function successful
>> >>> [compiler-2:08792] mca:oob:select: checking available component tcp
>> >>> [compiler-2:08792] mca:oob:select: Querying component [tcp]
>> >>> [compiler-2:08792] oob:tcp: component_available called
>> >>> [compiler-2:08792] WORKING INTERFACE 1 KERNEL INDEX 1 FAMILY: V4
>> >>> [compiler-2:08792] WORKING INTERFACE 2 KERNEL INDEX 3 FAMILY: V4
>> >>> [compiler-2:08792] WORKING INTERFACE 3 KERNEL INDEX 4 FAMILY: V4
>> >>> [compiler-2:08792] WORKING INTERFACE 4 KERNEL INDEX 5 FAMILY: V4
>> >>> [compiler-2:08792] WORKING INTERFACE 5 KERNEL INDEX 6 FAMILY: V4
>> >>> [compiler-2:08792] [[42190,0],0] oob:tcp:init adding 10.128.0.4 to our 
>> >>> list of V4 connections
>> >>> [compiler-2:08792] WORKING INTERFACE 6 KERNEL INDEX 7 FAMILY: V4
>> >>> [compiler-2:08792] [[42190,0],0] TCP STARTUP
>> >>> [compiler-2:08792] [[42190,0],0] attempting to bind to IPv4 port 0
>> >>> [compiler-2:08792] [[42190,0],0] assigned IPv4 port 53883
>> >>> [compiler-2:08792] mca:oob:select: Adding component to end
>> >>> [compiler-2:08792] mca:oob:select: Found 1 active transports
>> >>> [compiler-2:08792] mca: base: components_register: registering rml 
>> >>> components
>> >>> [compiler-2:08792] mca: base: components_register: found loaded 
>> >>> component oob
>> >>> [compiler-2:08792] mca: base: components_register: component oob has no 
>> >>> register or open function
>> >>> [compiler-2:08792] mca: base: components_open: opening rml components
>> >>> [compiler-2:08792] mca: base: components_open: found loaded component oob
>> >>> [compiler-2:08792] mca: base: components_open: component oob open 
>> >>> function successful
>> >>> [compiler-2:08792] orte_rml_base_select: initializing rml component oob
>> >>> [compiler-2:08792] [[42190,0],0] posting recv
>> >>> [compiler-2:08792] [[42190,0],0] posting persistent recv on tag 30 for 
>> >>> peer [[WILDCARD],WILDCARD]
>> >>> [compiler-2:08792] [[42190,0],0] posting recv
>> >>> [compiler-2:08792] [[42190,0],0] posting persistent recv on tag 15 for 
>> >>> peer [[WILDCARD],WILDCARD]
>> >>> [compiler-2:08792] [[42190,0],0] posting recv
>> >>> [compiler-2:08792] [[42190,0],0] posting persistent recv on tag 32 for 
>> >>> peer [[WILDCARD],WILDCARD]
>> >>> [compiler-2:08792] [[42190,0],0] posting recv
>> >>> [compiler-2:08792] [[42190,0],0] posting persistent recv on tag 33 for 
>> >>> peer [[WILDCARD],WILDCARD]
>> >>> [compiler-2:08792] [[42190,0],0] posting recv
>> >>> [compiler-2:08792] [[42190,0],0] posting persistent recv on tag 5 for 
>> >>> peer [[WILDCARD],WILDCARD]
>> >>> [compiler-2:08792] [[42190,0],0] posting recv
>> >>> [compiler-2:08792] [[42190,0],0] posting persistent recv on tag 10 for 
>> >>> peer [[WILDCARD],WILDCARD]
>> >>> [compiler-2:08792] [[42190,0],0] posting recv
>> >>> [compiler-2:08792] [[42190,0],0] posting persistent recv on tag 12 for 
>> >>> peer [[WILDCARD],WILDCARD]
>> >>> [compiler-2:08792] [[42190,0],0] posting recv
>> >>> [compiler-2:08792] [[42190,0],0] posting persistent recv on tag 9 for 
>> >>> peer [[WILDCARD],WILDCARD]
>> >>> [compiler-2:08792] [[42190,0],0] posting recv
>> >>> [compiler-2:08792] [[42190,0],0] posting persistent recv on tag 34 for 
>> >>> peer [[WILDCARD],WILDCARD]
>> >>> [compiler-2:08792] [[42190,0],0] posting recv
>> >>> [compiler-2:08792] [[42190,0],0] posting persistent recv on tag 2 for 
>> >>> peer [[WILDCARD],WILDCARD]
>> >>> [compiler-2:08792] [[42190,0],0] posting recv
>> >>> [compiler-2:08792] [[42190,0],0] posting persistent recv on tag 21 for 
>> >>> peer [[WILDCARD],WILDCARD]
>> >>> [compiler-2:08792] [[42190,0],0] posting recv
>> >>> [compiler-2:08792] [[42190,0],0] posting persistent recv on tag 22 for 
>> >>> peer [[WILDCARD],WILDCARD]
>> >>> [compiler-2:08792] [[42190,0],0] posting recv
>> >>> [compiler-2:08792] [[42190,0],0] posting persistent recv on tag 45 for 
>> >>> peer [[WILDCARD],WILDCARD]
>> >>> [compiler-2:08792] [[42190,0],0] posting recv
>> >>> [compiler-2:08792] [[42190,0],0] posting persistent recv on tag 46 for 
>> >>> peer [[WILDCARD],WILDCARD]
>> >>> [compiler-2:08792] [[42190,0],0] posting recv
>> >>> [compiler-2:08792] [[42190,0],0] posting persistent recv on tag 1 for 
>> >>> peer [[WILDCARD],WILDCARD]
>> >>> [compiler-2:08792] [[42190,0],0] posting recv
>> >>> [compiler-2:08792] [[42190,0],0] posting persistent recv on tag 27 for 
>> >>> peer [[WILDCARD],WILDCARD]
>> >>> Daemon was launched on node1-128-01 - beginning to initialize
>> >>> Daemon was launched on node1-128-02 - beginning to initialize
>> >>> --------------------------------------------------------------------------
>> >>> WARNING: An invalid value was given for oob_tcp_if_include. This
>> >>> value will be ignored.
>> >>> 
>> >>> Local host: node1-128-01
>> >>> Value: "ib0"
>> >>> Message: Invalid specification (missing "/")
>> >>> --------------------------------------------------------------------------
>> >>> --------------------------------------------------------------------------
>> >>> WARNING: An invalid value was given for oob_tcp_if_include. This
>> >>> value will be ignored.
>> >>> 
>> >>> Local host: node1-128-02
>> >>> Value: "ib0"
>> >>> Message: Invalid specification (missing "/")
>> >>> --------------------------------------------------------------------------
>> >>> --------------------------------------------------------------------------
>> >>> None of the TCP networks specified to be included for out-of-band 
>> >>> communications
>> >>> could be found:
>> >>> 
>> >>> Value given:
>> >>> 
>> >>> Please revise the specification and try again.
>> >>> --------------------------------------------------------------------------
>> >>> --------------------------------------------------------------------------
>> >>> None of the TCP networks specified to be included for out-of-band 
>> >>> communications
>> >>> could be found:
>> >>> 
>> >>> Value given:
>> >>> 
>> >>> Please revise the specification and try again.
>> >>> --------------------------------------------------------------------------
>> >>> --------------------------------------------------------------------------
>> >>> No network interfaces were found for out-of-band communications. We 
>> >>> require
>> >>> at least one available network for out-of-band messaging.
>> >>> --------------------------------------------------------------------------
>> >>> --------------------------------------------------------------------------
>> >>> No network interfaces were found for out-of-band communications. We 
>> >>> require
>> >>> at least one available network for out-of-band messaging.
>> >>> --------------------------------------------------------------------------
>> >>> --------------------------------------------------------------------------
>> >>> It looks like orte_init failed for some reason; your parallel process is
>> >>> likely to abort. There are many reasons that a parallel process can
>> >>> fail during orte_init; some of which are due to configuration or
>> >>> environment problems. This failure appears to be an internal failure;
>> >>> here's some additional information (which may only be relevant to an
>> >>> Open MPI developer):
>> >>> 
>> >>> orte_oob_base_select failed
>> >>> --> Returned value (null) (-43) instead of ORTE_SUCCESS
>> >>> --------------------------------------------------------------------------
>> >>> --------------------------------------------------------------------------
>> >>> It looks like orte_init failed for some reason; your parallel process is
>> >>> likely to abort. There are many reasons that a parallel process can
>> >>> fail during orte_init; some of which are due to configuration or
>> >>> environment problems. This failure appears to be an internal failure;
>> >>> here's some additional information (which may only be relevant to an
>> >>> Open MPI developer):
>> >>> 
>> >>> orte_oob_base_select failed
>> >>> --> Returned value (null) (-43) instead of ORTE_SUCCESS
>> >>> --------------------------------------------------------------------------
>> >>> srun: error: node1-128-02: task 1: Exited with exit code 213
>> >>> srun: Terminating job step 657300.0
>> >>> srun: error: node1-128-01: task 0: Exited with exit code 213
>> >>> --------------------------------------------------------------------------
>> >>> An ORTE daemon has unexpectedly failed after launch and before
>> >>> communicating back to mpirun. This could be caused by a number
>> >>> of factors, including an inability to create a connection back
>> >>> to mpirun due to a lack of common network interfaces and/or no
>> >>> route found between them. Please check network connectivity
>> >>> (including firewalls and network routing requirements).
>> >>> --------------------------------------------------------------------------
>> >>> [compiler-2:08792] [[42190,0],0] orted_cmd: received halt_vm cmd
>> >>> [compiler-2:08792] mca: base: close: component oob closed
>> >>> [compiler-2:08792] mca: base: close: unloading component oob
>> >>> [compiler-2:08792] [[42190,0],0] TCP SHUTDOWN
>> >>> [compiler-2:08792] mca: base: close: component tcp closed
>> >>> [compiler-2:08792] mca: base: close: unloading component tcp
>> >>> 
>> >>> 
>> >>> 
>> >>> Tue, 12 Aug 2014 16:14:58 +0400 от Timur Ismagilov <tismagi...@mail.ru>:
>> >>> Hello!
>> >>> 
>> >>> I have Open MPI v1.8.2rc4r32485
>> >>> 
>> >>> When i run hello_c, I got this error message
>> >>> $mpirun -np 2 hello_c
>> >>> 
>> >>> An ORTE daemon has unexpectedly failed after launch and before
>> >>> 
>> >>> communicating back to mpirun. This could be caused by a number
>> >>> of factors, including an inability to create a connection back
>> >>> to mpirun due to a lack of common network interfaces and/or no
>> >>> route found between them. Please check network connectivity
>> >>> (including firewalls and network routing requirements).
>> >>> 
>> >>> When i run with --debug-daemons --mca plm_base_verbose 5 -mca 
>> >>> oob_base_verbose 10 -mca rml_base_verbose 10 i got this output:
>> >>> $mpirun --debug-daemons --mca plm_base_verbose 5 -mca oob_base_verbose 
>> >>> 10 -mca rml_base_verbose 10 -np 2 hello_c
>> >>> 
>> >>> [compiler-2:08780] mca:base:select:( plm) Querying component [isolated]
>> >>> [compiler-2:08780] mca:base:select:( plm) Query of component [isolated] 
>> >>> set priority to 0
>> >>> [compiler-2:08780] mca:base:select:( plm) Querying component [rsh]
>> >>> [compiler-2:08780] mca:base:select:( plm) Query of component [rsh] set 
>> >>> priority to 10
>> >>> [compiler-2:08780] mca:base:select:( plm) Querying component [slurm]
>> >>> [compiler-2:08780] mca:base:select:( plm) Query of component [slurm] set 
>> >>> priority to 75
>> >>> [compiler-2:08780] mca:base:select:( plm) Selected component [slurm]
>> >>> [compiler-2:08780] mca: base: components_register: registering oob 
>> >>> components
>> >>> [compiler-2:08780] mca: base: components_register: found loaded 
>> >>> component tcp
>> >>> [compiler-2:08780] mca: base: components_register: component tcp 
>> >>> register function successful
>> >>> [compiler-2:08780] mca: base: components_open: opening oob components
>> >>> [compiler-2:08780] mca: base: components_open: found loaded component tcp
>> >>> [compiler-2:08780] mca: base: components_open: component tcp open 
>> >>> function successful
>> >>> [compiler-2:08780] mca:oob:select: checking available component tcp
>> >>> [compiler-2:08780] mca:oob:select: Querying component [tcp]
>> >>> [compiler-2:08780] oob:tcp: component_available called
>> >>> [compiler-2:08780] WORKING INTERFACE 1 KERNEL INDEX 1 FAMILY: V4
>> >>> [compiler-2:08780] WORKING INTERFACE 2 KERNEL INDEX 3 FAMILY: V4
>> >>> [compiler-2:08780] [[42202,0],0] oob:tcp:init adding 10.0.251.53 to our 
>> >>> list of V4 connections
>> >>> [compiler-2:08780] WORKING INTERFACE 3 KERNEL INDEX 4 FAMILY: V4
>> >>> [compiler-2:08780] [[42202,0],0] oob:tcp:init adding 10.0.0.4 to our 
>> >>> list of V4 connections
>> >>> [compiler-2:08780] WORKING INTERFACE 4 KERNEL INDEX 5 FAMILY: V4
>> >>> [compiler-2:08780] [[42202,0],0] oob:tcp:init adding 10.2.251.14 to our 
>> >>> list of V4 connections
>> >>> [compiler-2:08780] WORKING INTERFACE 5 KERNEL INDEX 6 FAMILY: V4
>> >>> [compiler-2:08780] [[42202,0],0] oob:tcp:init adding 10.128.0.4 to our 
>> >>> list of V4 connections
>> >>> [compiler-2:08780] WORKING INTERFACE 6 KERNEL INDEX 7 FAMILY: V4
>> >>> [compiler-2:08780] [[42202,0],0] oob:tcp:init adding 93.180.7.38 to our 
>> >>> list of V4 connections
>> >>> [compiler-2:08780] [[42202,0],0] TCP STARTUP
>> >>> [compiler-2:08780] [[42202,0],0] attempting to bind to IPv4 port 0
>> >>> [compiler-2:08780] [[42202,0],0] assigned IPv4 port 38420
>> >>> [compiler-2:08780] mca:oob:select: Adding component to end
>> >>> [compiler-2:08780] mca:oob:select: Found 1 active transports
>> >>> [compiler-2:08780] mca: base: components_register: registering rml 
>> >>> components
>> >>> [compiler-2:08780] mca: base: components_register: found loaded 
>> >>> component oob
>> >>> [compiler-2:08780] mca: base: components_register: component oob has no 
>> >>> register or open function
>> >>> [compiler-2:08780] mca: base: components_open: opening rml components
>> >>> [compiler-2:08780] mca: base: components_open: found loaded component oob
>> >>> [compiler-2:08780] mca: base: components_open: component oob open 
>> >>> function successful
>> >>> [compiler-2:08780] orte_rml_base_select: initializing rml component oob
>> >>> [compiler-2:08780] [[42202,0],0] posting recv
>> >>> [compiler-2:08780] [[42202,0],0] posting persistent recv on tag 30 for 
>> >>> peer [[WILDCARD],WILDCARD]
>> >>> [compiler-2:08780] [[42202,0],0] posting recv
>> >>> [compiler-2:08780] [[42202,0],0] posting persistent recv on tag 15 for 
>> >>> peer [[WILDCARD],WILDCARD]
>> >>> [compiler-2:08780] [[42202,0],0] posting recv
>> >>> [compiler-2:08780] [[42202,0],0] posting persistent recv on tag 32 for 
>> >>> peer [[WILDCARD],WILDCARD]
>> >>> [compiler-2:08780] [[42202,0],0] posting recv
>> >>> [compiler-2:08780] [[42202,0],0] posting persistent recv on tag 33 for 
>> >>> peer [[WILDCARD],WILDCARD]
>> >>> [compiler-2:08780] [[42202,0],0] posting recv
>> >>> [compiler-2:08780] [[42202,0],0] posting persistent recv on tag 5 for 
>> >>> peer [[WILDCARD],WILDCARD]
>> >>> [compiler-2:08780] [[42202,0],0] posting recv
>> >>> [compiler-2:08780] [[42202,0],0] posting persistent recv on tag 10 for 
>> >>> peer [[WILDCARD],WILDCARD]
>> >>> [compiler-2:08780] [[42202,0],0] posting recv
>> >>> [compiler-2:08780] [[42202,0],0] posting persistent recv on tag 12 for 
>> >>> peer [[WILDCARD],WILDCARD]
>> >>> [compiler-2:08780] [[42202,0],0] posting recv
>> >>> [compiler-2:08780] [[42202,0],0] posting persistent recv on tag 9 for 
>> >>> peer [[WILDCARD],WILDCARD]
>> >>> [compiler-2:08780] [[42202,0],0] posting recv
>> >>> [compiler-2:08780] [[42202,0],0] posting persistent recv on tag 34 for 
>> >>> peer [[WILDCARD],WILDCARD]
>> >>> [compiler-2:08780] [[42202,0],0] posting recv
>> >>> [compiler-2:08780] [[42202,0],0] posting persistent recv on tag 2 for 
>> >>> peer [[WILDCARD],WILDCARD]
>> >>> [compiler-2:08780] [[42202,0],0] posting recv
>> >>> [compiler-2:08780] [[42202,0],0] posting persistent recv on tag 21 for 
>> >>> peer [[WILDCARD],WILDCARD]
>> >>> [compiler-2:08780] [[42202,0],0] posting recv
>> >>> [compiler-2:08780] [[42202,0],0] posting persistent recv on tag 22 for 
>> >>> peer [[WILDCARD],WILDCARD]
>> >>> [compiler-2:08780] [[42202,0],0] posting recv
>> >>> [compiler-2:08780] [[42202,0],0] posting persistent recv on tag 45 for 
>> >>> peer [[WILDCARD],WILDCARD]
>> >>> [compiler-2:08780] [[42202,0],0] posting recv
>> >>> [compiler-2:08780] [[42202,0],0] posting persistent recv on tag 46 for 
>> >>> peer [[WILDCARD],WILDCARD]
>> >>> [compiler-2:08780] [[42202,0],0] posting recv
>> >>> [compiler-2:08780] [[42202,0],0] posting persistent recv on tag 1 for 
>> >>> peer [[WILDCARD],WILDCARD]
>> >>> [compiler-2:08780] [[42202,0],0] posting recv
>> >>> [compiler-2:08780] [[42202,0],0] posting persistent recv on tag 27 for 
>> >>> peer [[WILDCARD],WILDCARD]
>> >>> Daemon was launched on node1-130-08 - beginning to initialize
>> >>> Daemon was launched on node1-130-03 - beginning to initialize
>> >>> Daemon was launched on node1-130-05 - beginning to initialize
>> >>> Daemon was launched on node1-130-02 - beginning to initialize
>> >>> Daemon was launched on node1-130-01 - beginning to initialize
>> >>> Daemon was launched on node1-130-04 - beginning to initialize
>> >>> Daemon was launched on node1-130-07 - beginning to initialize
>> >>> Daemon was launched on node1-130-06 - beginning to initialize
>> >>> Daemon [[42202,0],3] checking in as pid 7178 on host node1-130-03
>> >>> [node1-130-03:07178] [[42202,0],3] orted: up and running - waiting for 
>> >>> commands!
>> >>> Daemon [[42202,0],2] checking in as pid 13581 on host node1-130-02
>> >>> [node1-130-02:13581] [[42202,0],2] orted: up and running - waiting for 
>> >>> commands!
>> >>> Daemon [[42202,0],1] checking in as pid 17220 on host node1-130-01
>> >>> [node1-130-01:17220] [[42202,0],1] orted: up and running - waiting for 
>> >>> commands!
>> >>> Daemon [[42202,0],5] checking in as pid 6663 on host node1-130-05
>> >>> [node1-130-05:06663] [[42202,0],5] orted: up and running - waiting for 
>> >>> commands!
>> >>> Daemon [[42202,0],8] checking in as pid 6683 on host node1-130-08
>> >>> [node1-130-08:06683] [[42202,0],8] orted: up and running - waiting for 
>> >>> commands!
>> >>> Daemon [[42202,0],7] checking in as pid 7877 on host node1-130-07
>> >>> [node1-130-07:07877] [[42202,0],7] orted: up and running - waiting for 
>> >>> commands!
>> >>> Daemon [[42202,0],4] checking in as pid 7735 on host node1-130-04
>> >>> [node1-130-04:07735] [[42202,0],4] orted: up and running - waiting for 
>> >>> commands!
>> >>> Daemon [[42202,0],6] checking in as pid 8451 on host node1-130-06
>> >>> [node1-130-06:08451] [[42202,0],6] orted: up and running - waiting for 
>> >>> commands!
>> >>> srun: error: node1-130-03: task 2: Exited with exit code 1
>> >>> srun: Terminating job step 657040.1
>> >>> srun: error: node1-130-02: task 1: Exited with exit code 1
>> >>> slurmd[node1-130-04]: *** STEP 657040.1 KILLED AT 2014-08-12T12:59:07 
>> >>> WITH SIGNAL 9 ***
>> >>> slurmd[node1-130-07]: *** STEP 657040.1 KILLED AT 2014-08-12T12:59:07 
>> >>> WITH SIGNAL 9 ***
>> >>> slurmd[node1-130-06]: *** STEP 657040.1 KILLED AT 2014-08-12T12:59:07 
>> >>> WITH SIGNAL 9 ***
>> >>> srun: Job step aborted: Waiting up to 2 seconds for job step to finish.
>> >>> srun: error: node1-130-01: task 0: Exited with exit code 1
>> >>> srun: error: node1-130-05: task 4: Exited with exit code 1
>> >>> srun: error: node1-130-08: task 7: Exited with exit code 1
>> >>> srun: error: node1-130-07: task 6: Exited with exit code 1
>> >>> srun: error: node1-130-04: task 3: Killed
>> >>> srun: error: node1-130-06: task 5: Killed
>> >>> --------------------------------------------------------------------------
>> >>> An ORTE daemon has unexpectedly failed after launch and before
>> >>> communicating back to mpirun. This could be caused by a number
>> >>> of factors, including an inability to create a connection back
>> >>> to mpirun due to a lack of common network interfaces and/or no
>> >>> route found between them. Please check network connectivity
>> >>> (including firewalls and network routing requirements).
>> >>> --------------------------------------------------------------------------
>> >>> [compiler-2:08780] [[42202,0],0] orted_cmd: received halt_vm cmd
>> >>> [compiler-2:08780] mca: base: close: component oob closed
>> >>> [compiler-2:08780] mca: base: close: unloading component oob
>> >>> [compiler-2:08780] [[42202,0],0] TCP SHUTDOWN
>> >>> [compiler-2:08780] mca: base: close: component tcp closed
>> >>> [compiler-2:08780] mca: base: close: unloading component tcp
>> >>> 
>> >>> _______________________________________________
>> >>> users mailing list
>> >>> us...@open-mpi.org
>> >>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
>> >>> Link to this post: 
>> >>> http://www.open-mpi.org/community/lists/users/2014/08/24987.php
>> >>> 
>> >>> 
>> >>> 
>> >>> _______________________________________________
>> >>> users mailing list
>> >>> us...@open-mpi.org
>> >>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
>> >>> Link to this post: 
>> >>> http://www.open-mpi.org/community/lists/users/2014/08/24988.php
>> >> 
>> >> 
>> >> -- 
>> >> Jeff Squyres
>> >> jsquy...@cisco.com
>> >> For corporate legal information go to: 
>> >> http://www.cisco.com/web/about/doing_business/legal/cri/
>> >> 
>> >> 
>> >> 
>> >> 
>> >> 
>> > 
>> > 
>> > -- 
>> > Jeff Squyres
>> > jsquy...@cisco.com
>> > For corporate legal information go to: 
>> > http://www.cisco.com/web/about/doing_business/legal/cri/
>> > 
>> > _______________________________________________
>> > users mailing list
>> > us...@open-mpi.org
>> > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
>> > Link to this post: 
>> > http://www.open-mpi.org/community/lists/users/2014/08/25001.php
>> 
>> 
>> -- 
>> Jeff Squyres
>> jsquy...@cisco.com
>> For corporate legal information go to: 
>> http://www.cisco.com/web/about/doing_business/legal/cri/
>> 
>> 
>> 
>> 
>> 
>> _______________________________________________
>> users mailing list
>> us...@open-mpi.org
>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
>> Link to this post: 
>> http://www.open-mpi.org/community/lists/users/2014/08/25086.php
> 
> 
> _______________________________________________
> users mailing list
> us...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
> Link to this post: 
> http://www.open-mpi.org/community/lists/users/2014/08/25093.php
> 
> _______________________________________________
> users mailing list
> us...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
> Link to this post: 
> http://www.open-mpi.org/community/lists/users/2014/08/25094.php

Reply via email to