Not sure I understand. The problem has been fixed in both the trunk and the 1.8 
branch now, so you should be able to work with either of those nightly builds.

On Aug 21, 2014, at 12:02 AM, Timur Ismagilov <tismagi...@mail.ru> wrote:

> Have i I any opportunity to run mpi jobs?
> 
> 
> Wed, 20 Aug 2014 10:48:38 -0700 от Ralph Castain <r...@open-mpi.org>:
> yes, i know - it is cmr'd
> 
> On Aug 20, 2014, at 10:26 AM, Mike Dubman <mi...@dev.mellanox.co.il> wrote:
> 
>> btw, we get same error in v1.8 branch as well.
>> 
>> 
>> On Wed, Aug 20, 2014 at 8:06 PM, Ralph Castain <r...@open-mpi.org> wrote:
>> It was not yet fixed - but should be now.
>> 
>> On Aug 20, 2014, at 6:39 AM, Timur Ismagilov <tismagi...@mail.ru> wrote:
>> 
>>> Hello!
>>> 
>>> As i can see, the bug is fixed, but in Open MPI v1.9a1r32516  i still have 
>>> the problem
>>> 
>>> a)
>>> $ mpirun  -np 1 ./hello_c
>>> 
>>> --------------------------------------------------------------------------
>>> An ORTE daemon has unexpectedly failed after launch and before
>>> communicating back to mpirun. This could be caused by a number
>>> of factors, including an inability to create a connection back
>>> to mpirun due to a lack of common network interfaces and/or no
>>> route found between them. Please check network connectivity
>>> (including firewalls and network routing requirements).
>>> --------------------------------------------------------------------------
>>> 
>>> b)
>>> $ mpirun --mca oob_tcp_if_include ib0 -np 1 ./hello_c
>>> --------------------------------------------------------------------------
>>> An ORTE daemon has unexpectedly failed after launch and before
>>> communicating back to mpirun. This could be caused by a number
>>> of factors, including an inability to create a connection back
>>> to mpirun due to a lack of common network interfaces and/or no
>>> route found between them. Please check network connectivity
>>> (including firewalls and network routing requirements).
>>> --------------------------------------------------------------------------
>>> 
>>> c)
>>> 
>>> $ mpirun --mca oob_tcp_if_include ib0 -debug-daemons --mca plm_base_verbose 
>>> 5 -mca oob_base_verbose 10 -mca rml_base_verbose 10 -np 1 ./hello_c
>>> 
>>> [compiler-2:14673] mca:base:select:( plm) Querying component [isolated]
>>> [compiler-2:14673] mca:base:select:( plm) Query of component [isolated] set 
>>> priority to 0
>>> [compiler-2:14673] mca:base:select:( plm) Querying component [rsh]
>>> [compiler-2:14673] mca:base:select:( plm) Query of component [rsh] set 
>>> priority to 10
>>> [compiler-2:14673] mca:base:select:( plm) Querying component [slurm]
>>> [compiler-2:14673] mca:base:select:( plm) Query of component [slurm] set 
>>> priority to 75
>>> [compiler-2:14673] mca:base:select:( plm) Selected component [slurm]
>>> [compiler-2:14673] mca: base: components_register: registering oob 
>>> components
>>> [compiler-2:14673] mca: base: components_register: found loaded component 
>>> tcp
>>> [compiler-2:14673] mca: base: components_register: component tcp register 
>>> function successful
>>> [compiler-2:14673] mca: base: components_open: opening oob components
>>> [compiler-2:14673] mca: base: components_open: found loaded component tcp
>>> [compiler-2:14673] mca: base: components_open: component tcp open function 
>>> successful
>>> [compiler-2:14673] mca:oob:select: checking available component tcp
>>> [compiler-2:14673] mca:oob:select: Querying component [tcp]
>>> [compiler-2:14673] oob:tcp: component_available called
>>> [compiler-2:14673] WORKING INTERFACE 1 KERNEL INDEX 1 FAMILY: V4
>>> [compiler-2:14673] WORKING INTERFACE 2 KERNEL INDEX 3 FAMILY: V4
>>> [compiler-2:14673] WORKING INTERFACE 3 KERNEL INDEX 4 FAMILY: V4
>>> [compiler-2:14673] WORKING INTERFACE 4 KERNEL INDEX 5 FAMILY: V4
>>> [compiler-2:14673] WORKING INTERFACE 5 KERNEL INDEX 6 FAMILY: V4
>>> [compiler-2:14673] [[49095,0],0] oob:tcp:init adding 10.128.0.4 to our list 
>>> of V4 connections
>>> [compiler-2:14673] WORKING INTERFACE 6 KERNEL INDEX 7 FAMILY: V4
>>> [compiler-2:14673] [[49095,0],0] TCP STARTUP
>>> [compiler-2:14673] [[49095,0],0] attempting to bind to IPv4 port 0
>>> [compiler-2:14673] [[49095,0],0] assigned IPv4 port 59460
>>> [compiler-2:14673] mca:oob:select: Adding component to end
>>> [compiler-2:14673] mca:oob:select: Found 1 active transports
>>> [compiler-2:14673] mca: base: components_register: registering rml 
>>> components
>>> [compiler-2:14673] mca: base: components_register: found loaded component 
>>> oob
>>> [compiler-2:14673] mca: base: components_register: component oob has no 
>>> register or open function
>>> [compiler-2:14673] mca: base: components_open: opening rml components
>>> [compiler-2:14673] mca: base: components_open: found loaded component oob
>>> [compiler-2:14673] mca: base: components_open: component oob open function 
>>> successful
>>> [compiler-2:14673] orte_rml_base_select: initializing rml component oob
>>> [compiler-2:14673] [[49095,0],0] posting recv
>>> [compiler-2:14673] [[49095,0],0] posting persistent recv on tag 30 for peer 
>>> [[WILDCARD],WILDCARD]
>>> [compiler-2:14673] [[49095,0],0] posting recv
>>> [compiler-2:14673] [[49095,0],0] posting persistent recv on tag 15 for peer 
>>> [[WILDCARD],WILDCARD]
>>> [compiler-2:14673] [[49095,0],0] posting recv
>>> [compiler-2:14673] [[49095,0],0] posting persistent recv on tag 32 for peer 
>>> [[WILDCARD],WILDCARD]
>>> [compiler-2:14673] [[49095,0],0] posting recv
>>> [compiler-2:14673] [[49095,0],0] posting persistent recv on tag 33 for peer 
>>> [[WILDCARD],WILDCARD]
>>> [compiler-2:14673] [[49095,0],0] posting recv
>>> [compiler-2:14673] [[49095,0],0] posting persistent recv on tag 5 for peer 
>>> [[WILDCARD],WILDCARD]
>>> [compiler-2:14673] [[49095,0],0] posting recv
>>> [compiler-2:14673] [[49095,0],0] posting persistent recv on tag 10 for peer 
>>> [[WILDCARD],WILDCARD]
>>> [compiler-2:14673] [[49095,0],0] posting recv
>>> [compiler-2:14673] [[49095,0],0] posting persistent recv on tag 12 for peer 
>>> [[WILDCARD],WILDCARD]
>>> [compiler-2:14673] [[49095,0],0] posting recv
>>> [compiler-2:14673] [[49095,0],0] posting persistent recv on tag 9 for peer 
>>> [[WILDCARD],WILDCARD]
>>> [compiler-2:14673] [[49095,0],0] posting recv
>>> [compiler-2:14673] [[49095,0],0] posting persistent recv on tag 34 for peer 
>>> [[WILDCARD],WILDCARD]
>>> [compiler-2:14673] [[49095,0],0] posting recv
>>> [compiler-2:14673] [[49095,0],0] posting persistent recv on tag 2 for peer 
>>> [[WILDCARD],WILDCARD]
>>> [compiler-2:14673] [[49095,0],0] posting recv
>>> [compiler-2:14673] [[49095,0],0] posting persistent recv on tag 21 for peer 
>>> [[WILDCARD],WILDCARD]
>>> [compiler-2:14673] [[49095,0],0] posting recv
>>> [compiler-2:14673] [[49095,0],0] posting persistent recv on tag 22 for peer 
>>> [[WILDCARD],WILDCARD]
>>> [compiler-2:14673] [[49095,0],0] posting recv
>>> [compiler-2:14673] [[49095,0],0] posting persistent recv on tag 45 for peer 
>>> [[WILDCARD],WILDCARD]
>>> [compiler-2:14673] [[49095,0],0] posting recv
>>> [compiler-2:14673] [[49095,0],0] posting persistent recv on tag 46 for peer 
>>> [[WILDCARD],WILDCARD]
>>> [compiler-2:14673] [[49095,0],0] posting recv
>>> [compiler-2:14673] [[49095,0],0] posting persistent recv on tag 1 for peer 
>>> [[WILDCARD],WILDCARD]
>>> [compiler-2:14673] [[49095,0],0] posting recv
>>> [compiler-2:14673] [[49095,0],0] posting persistent recv on tag 27 for peer 
>>> [[WILDCARD],WILDCARD]
>>> Daemon was launched on node1-128-01 - beginning to initialize
>>> --------------------------------------------------------------------------
>>> WARNING: An invalid value was given for oob_tcp_if_include. This
>>> value will be ignored.
>>> 
>>> Local host: node1-128-01
>>> Value: "ib0"
>>> Message: Invalid specification (missing "/")
>>> --------------------------------------------------------------------------
>>> --------------------------------------------------------------------------
>>> None of the TCP networks specified to be included for out-of-band 
>>> communications
>>> could be found:
>>> 
>>> Value given:
>>> 
>>> Please revise the specification and try again.
>>> --------------------------------------------------------------------------
>>> --------------------------------------------------------------------------
>>> No network interfaces were found for out-of-band communications. We require
>>> at least one available network for out-of-band messaging.
>>> --------------------------------------------------------------------------
>>> --------------------------------------------------------------------------
>>> It looks like orte_init failed for some reason; your parallel process is
>>> likely to abort. There are many reasons that a parallel process can
>>> fail during orte_init; some of which are due to configuration or
>>> environment problems. This failure appears to be an internal failure;
>>> here's some additional information (which may only be relevant to an
>>> Open MPI developer):
>>> 
>>> orte_oob_base_select failed
>>> --> Returned value (null) (-43) instead of ORTE_SUCCESS
>>> --------------------------------------------------------------------------
>>> srun: error: node1-128-01: task 0: Exited with exit code 213
>>> srun: Terminating job step 661215.0
>>> --------------------------------------------------------------------------
>>> An ORTE daemon has unexpectedly failed after launch and before
>>> communicating back to mpirun. This could be caused by a number
>>> of factors, including an inability to create a connection back
>>> to mpirun due to a lack of common network interfaces and/or no
>>> route found between them. Please check network connectivity
>>> (including firewalls and network routing requirements).
>>> --------------------------------------------------------------------------
>>> [compiler-2:14673] [[49095,0],0] orted_cmd: received halt_vm cmd
>>> [compiler-2:14673] mca: base: close: component oob closed
>>> [compiler-2:14673] mca: base: close: unloading component oob
>>> [compiler-2:14673] [[49095,0],0] TCP SHUTDOWN
>>> [compiler-2:14673] mca: base: close: component tcp closed
>>> [compiler-2:14673] mca: base: close: unloading component tcp
>>> 
>>> 
>>> 
>>> 
>>> Tue, 12 Aug 2014 18:33:24 +0000 от "Jeff Squyres (jsquyres)" 
>>> <jsquy...@cisco.com>:
>>> I filed the following ticket:
>>> 
>>>     https://svn.open-mpi.org/trac/ompi/ticket/4857
>>> 
>>> 
>>> On Aug 12, 2014, at 12:39 PM, Jeff Squyres (jsquyres) <jsquy...@cisco.com> 
>>> wrote:
>>> 
>>> > (please keep the users list CC'ed)
>>> > 
>>> > We talked about this on the weekly engineering call today. Ralph has an 
>>> > idea what is happening -- I need to do a little investigation today and 
>>> > file a bug. I'll make sure you're CC'ed on the bug ticket.
>>> > 
>>> > 
>>> > 
>>> > On Aug 12, 2014, at 12:27 PM, Timur Ismagilov <tismagi...@mail.ru> wrote:
>>> > 
>>> >> I don't have this error in OMPI 1.9a1r32252 and OMPI 1.8.1 (with --mca 
>>> >> oob_tcp_if_include ib0), but in all latest night snapshots i got this 
>>> >> error.
>>> >> 
>>> >> 
>>> >> Tue, 12 Aug 2014 13:08:12 +0000 от "Jeff Squyres (jsquyres)" 
>>> >> <jsquy...@cisco.com>:
>>> >> Are you running any kind of firewall on the node where mpirun is 
>>> >> invoked? Open MPI needs to be able to use arbitrary TCP ports between 
>>> >> the servers on which it runs.
>>> >> 
>>> >> This second mail seems to imply a bug in OMPI's oob_tcp_if_include param 
>>> >> handling, however -- it's supposed to be able to handle an interface 
>>> >> name (not just a network specification).
>>> >> 
>>> >> Ralph -- can you have a look?
>>> >> 
>>> >> 
>>> >> On Aug 12, 2014, at 8:41 AM, Timur Ismagilov <tismagi...@mail.ru> wrote:
>>> >> 
>>> >>> When i add --mca oob_tcp_if_include ib0 (infiniband interface) to 
>>> >>> mpirun (as it was here: 
>>> >>> http://www.open-mpi.org/community/lists/users/2014/07/24857.php ) i got 
>>> >>> this output:
>>> >>> 
>>> >>> [compiler-2:08792] mca:base:select:( plm) Querying component [isolated]
>>> >>> [compiler-2:08792] mca:base:select:( plm) Query of component [isolated] 
>>> >>> set priority to 0
>>> >>> [compiler-2:08792] mca:base:select:( plm) Querying component [rsh]
>>> >>> [compiler-2:08792] mca:base:select:( plm) Query of component [rsh] set 
>>> >>> priority to 10
>>> >>> [compiler-2:08792] mca:base:select:( plm) Querying component [slurm]
>>> >>> [compiler-2:08792] mca:base:select:( plm) Query of component [slurm] 
>>> >>> set priority to 75
>>> >>> [compiler-2:08792] mca:base:select:( plm) Selected component [slurm]
>>> >>> [compiler-2:08792] mca: base: components_register: registering oob 
>>> >>> components
>>> >>> [compiler-2:08792] mca: base: components_register: found loaded 
>>> >>> component tcp
>>> >>> [compiler-2:08792] mca: base: components_register: component tcp 
>>> >>> register function successful
>>> >>> [compiler-2:08792] mca: base: components_open: opening oob components
>>> >>> [compiler-2:08792] mca: base: components_open: found loaded component 
>>> >>> tcp
>>> >>> [compiler-2:08792] mca: base: components_open: component tcp open 
>>> >>> function successful
>>> >>> [compiler-2:08792] mca:oob:select: checking available component tcp
>>> >>> [compiler-2:08792] mca:oob:select: Querying component [tcp]
>>> >>> [compiler-2:08792] oob:tcp: component_available called
>>> >>> [compiler-2:08792] WORKING INTERFACE 1 KERNEL INDEX 1 FAMILY: V4
>>> >>> [compiler-2:08792] WORKING INTERFACE 2 KERNEL INDEX 3 FAMILY: V4
>>> >>> [compiler-2:08792] WORKING INTERFACE 3 KERNEL INDEX 4 FAMILY: V4
>>> >>> [compiler-2:08792] WORKING INTERFACE 4 KERNEL INDEX 5 FAMILY: V4
>>> >>> [compiler-2:08792] WORKING INTERFACE 5 KERNEL INDEX 6 FAMILY: V4
>>> >>> [compiler-2:08792] [[42190,0],0] oob:tcp:init adding 10.128.0.4 to our 
>>> >>> list of V4 connections
>>> >>> [compiler-2:08792] WORKING INTERFACE 6 KERNEL INDEX 7 FAMILY: V4
>>> >>> [compiler-2:08792] [[42190,0],0] TCP STARTUP
>>> >>> [compiler-2:08792] [[42190,0],0] attempting to bind to IPv4 port 0
>>> >>> [compiler-2:08792] [[42190,0],0] assigned IPv4 port 53883
>>> >>> [compiler-2:08792] mca:oob:select: Adding component to end
>>> >>> [compiler-2:08792] mca:oob:select: Found 1 active transports
>>> >>> [compiler-2:08792] mca: base: components_register: registering rml 
>>> >>> components
>>> >>> [compiler-2:08792] mca: base: components_register: found loaded 
>>> >>> component oob
>>> >>> [compiler-2:08792] mca: base: components_register: component oob has no 
>>> >>> register or open function
>>> >>> [compiler-2:08792] mca: base: components_open: opening rml components
>>> >>> [compiler-2:08792] mca: base: components_open: found loaded component 
>>> >>> oob
>>> >>> [compiler-2:08792] mca: base: components_open: component oob open 
>>> >>> function successful
>>> >>> [compiler-2:08792] orte_rml_base_select: initializing rml component oob
>>> >>> [compiler-2:08792] [[42190,0],0] posting recv
>>> >>> [compiler-2:08792] [[42190,0],0] posting persistent recv on tag 30 for 
>>> >>> peer [[WILDCARD],WILDCARD]
>>> >>> [compiler-2:08792] [[42190,0],0] posting recv
>>> >>> [compiler-2:08792] [[42190,0],0] posting persistent recv on tag 15 for 
>>> >>> peer [[WILDCARD],WILDCARD]
>>> >>> [compiler-2:08792] [[42190,0],0] posting recv
>>> >>> [compiler-2:08792] [[42190,0],0] posting persistent recv on tag 32 for 
>>> >>> peer [[WILDCARD],WILDCARD]
>>> >>> [compiler-2:08792] [[42190,0],0] posting recv
>>> >>> [compiler-2:08792] [[42190,0],0] posting persistent recv on tag 33 for 
>>> >>> peer [[WILDCARD],WILDCARD]
>>> >>> [compiler-2:08792] [[42190,0],0] posting recv
>>> >>> [compiler-2:08792] [[42190,0],0] posting persistent recv on tag 5 for 
>>> >>> peer [[WILDCARD],WILDCARD]
>>> >>> [compiler-2:08792] [[42190,0],0] posting recv
>>> >>> [compiler-2:08792] [[42190,0],0] posting persistent recv on tag 10 for 
>>> >>> peer [[WILDCARD],WILDCARD]
>>> >>> [compiler-2:08792] [[42190,0],0] posting recv
>>> >>> [compiler-2:08792] [[42190,0],0] posting persistent recv on tag 12 for 
>>> >>> peer [[WILDCARD],WILDCARD]
>>> >>> [compiler-2:08792] [[42190,0],0] posting recv
>>> >>> [compiler-2:08792] [[42190,0],0] posting persistent recv on tag 9 for 
>>> >>> peer [[WILDCARD],WILDCARD]
>>> >>> [compiler-2:08792] [[42190,0],0] posting recv
>>> >>> [compiler-2:08792] [[42190,0],0] posting persistent recv on tag 34 for 
>>> >>> peer [[WILDCARD],WILDCARD]
>>> >>> [compiler-2:08792] [[42190,0],0] posting recv
>>> >>> [compiler-2:08792] [[42190,0],0] posting persistent recv on tag 2 for 
>>> >>> peer [[WILDCARD],WILDCARD]
>>> >>> [compiler-2:08792] [[42190,0],0] posting recv
>>> >>> [compiler-2:08792] [[42190,0],0] posting persistent recv on tag 21 for 
>>> >>> peer [[WILDCARD],WILDCARD]
>>> >>> [compiler-2:08792] [[42190,0],0] posting recv
>>> >>> [compiler-2:08792] [[42190,0],0] posting persistent recv on tag 22 for 
>>> >>> peer [[WILDCARD],WILDCARD]
>>> >>> [compiler-2:08792] [[42190,0],0] posting recv
>>> >>> [compiler-2:08792] [[42190,0],0] posting persistent recv on tag 45 for 
>>> >>> peer [[WILDCARD],WILDCARD]
>>> >>> [compiler-2:08792] [[42190,0],0] posting recv
>>> >>> [compiler-2:08792] [[42190,0],0] posting persistent recv on tag 46 for 
>>> >>> peer [[WILDCARD],WILDCARD]
>>> >>> [compiler-2:08792] [[42190,0],0] posting recv
>>> >>> [compiler-2:08792] [[42190,0],0] posting persistent recv on tag 1 for 
>>> >>> peer [[WILDCARD],WILDCARD]
>>> >>> [compiler-2:08792] [[42190,0],0] posting recv
>>> >>> [compiler-2:08792] [[42190,0],0] posting persistent recv on tag 27 for 
>>> >>> peer [[WILDCARD],WILDCARD]
>>> >>> Daemon was launched on node1-128-01 - beginning to initialize
>>> >>> Daemon was launched on node1-128-02 - beginning to initialize
>>> >>> --------------------------------------------------------------------------
>>> >>> WARNING: An invalid value was given for oob_tcp_if_include. This
>>> >>> value will be ignored.
>>> >>> 
>>> >>> Local host: node1-128-01
>>> >>> Value: "ib0"
>>> >>> Message: Invalid specification (missing "/")
>>> >>> --------------------------------------------------------------------------
>>> >>> --------------------------------------------------------------------------
>>> >>> WARNING: An invalid value was given for oob_tcp_if_include. This
>>> >>> value will be ignored.
>>> >>> 
>>> >>> Local host: node1-128-02
>>> >>> Value: "ib0"
>>> >>> Message: Invalid specification (missing "/")
>>> >>> --------------------------------------------------------------------------
>>> >>> --------------------------------------------------------------------------
>>> >>> None of the TCP networks specified to be included for out-of-band 
>>> >>> communications
>>> >>> could be found:
>>> >>> 
>>> >>> Value given:
>>> >>> 
>>> >>> Please revise the specification and try again.
>>> >>> --------------------------------------------------------------------------
>>> >>> --------------------------------------------------------------------------
>>> >>> None of the TCP networks specified to be included for out-of-band 
>>> >>> communications
>>> >>> could be found:
>>> >>> 
>>> >>> Value given:
>>> >>> 
>>> >>> Please revise the specification and try again.
>>> >>> --------------------------------------------------------------------------
>>> >>> --------------------------------------------------------------------------
>>> >>> No network interfaces were found for out-of-band communications. We 
>>> >>> require
>>> >>> at least one available network for out-of-band messaging.
>>> >>> --------------------------------------------------------------------------
>>> >>> --------------------------------------------------------------------------
>>> >>> No network interfaces were found for out-of-band communications. We 
>>> >>> require
>>> >>> at least one available network for out-of-band messaging.
>>> >>> --------------------------------------------------------------------------
>>> >>> --------------------------------------------------------------------------
>>> >>> It looks like orte_init failed for some reason; your parallel process is
>>> >>> likely to abort. There are many reasons that a parallel process can
>>> >>> fail during orte_init; some of which are due to configuration or
>>> >>> environment problems. This failure appears to be an internal failure;
>>> >>> here's some additional information (which may only be relevant to an
>>> >>> Open MPI developer):
>>> >>> 
>>> >>> orte_oob_base_select failed
>>> >>> --> Returned value (null) (-43) instead of ORTE_SUCCESS
>>> >>> --------------------------------------------------------------------------
>>> >>> --------------------------------------------------------------------------
>>> >>> It looks like orte_init failed for some reason; your parallel process is
>>> >>> likely to abort. There are many reasons that a parallel process can
>>> >>> fail during orte_init; some of which are due to configuration or
>>> >>> environment problems. This failure appears to be an internal failure;
>>> >>> here's some additional information (which may only be relevant to an
>>> >>> Open MPI developer):
>>> >>> 
>>> >>> orte_oob_base_select failed
>>> >>> --> Returned value (null) (-43) instead of ORTE_SUCCESS
>>> >>> --------------------------------------------------------------------------
>>> >>> srun: error: node1-128-02: task 1: Exited with exit code 213
>>> >>> srun: Terminating job step 657300.0
>>> >>> srun: error: node1-128-01: task 0: Exited with exit code 213
>>> >>> --------------------------------------------------------------------------
>>> >>> An ORTE daemon has unexpectedly failed after launch and before
>>> >>> communicating back to mpirun. This could be caused by a number
>>> >>> of factors, including an inability to create a connection back
>>> >>> to mpirun due to a lack of common network interfaces and/or no
>>> >>> route found between them. Please check network connectivity
>>> >>> (including firewalls and network routing requirements).
>>> >>> --------------------------------------------------------------------------
>>> >>> [compiler-2:08792] [[42190,0],0] orted_cmd: received halt_vm cmd
>>> >>> [compiler-2:08792] mca: base: close: component oob closed
>>> >>> [compiler-2:08792] mca: base: close: unloading component oob
>>> >>> [compiler-2:08792] [[42190,0],0] TCP SHUTDOWN
>>> >>> [compiler-2:08792] mca: base: close: component tcp closed
>>> >>> [compiler-2:08792] mca: base: close: unloading component tcp
>>> >>> 
>>> >>> 
>>> >>> 
>>> >>> Tue, 12 Aug 2014 16:14:58 +0400 от Timur Ismagilov <tismagi...@mail.ru>:
>>> >>> Hello!
>>> >>> 
>>> >>> I have Open MPI v1.8.2rc4r32485
>>> >>> 
>>> >>> When i run hello_c, I got this error message
>>> >>> $mpirun -np 2 hello_c
>>> >>> 
>>> >>> An ORTE daemon has unexpectedly failed after launch and before
>>> >>> 
>>> >>> communicating back to mpirun. This could be caused by a number
>>> >>> of factors, including an inability to create a connection back
>>> >>> to mpirun due to a lack of common network interfaces and/or no
>>> >>> route found between them. Please check network connectivity
>>> >>> (including firewalls and network routing requirements).
>>> >>> 
>>> >>> When i run with --debug-daemons --mca plm_base_verbose 5 -mca 
>>> >>> oob_base_verbose 10 -mca rml_base_verbose 10 i got this output:
>>> >>> $mpirun --debug-daemons --mca plm_base_verbose 5 -mca oob_base_verbose 
>>> >>> 10 -mca rml_base_verbose 10 -np 2 hello_c
>>> >>> 
>>> >>> [compiler-2:08780] mca:base:select:( plm) Querying component [isolated]
>>> >>> [compiler-2:08780] mca:base:select:( plm) Query of component [isolated] 
>>> >>> set priority to 0
>>> >>> [compiler-2:08780] mca:base:select:( plm) Querying component [rsh]
>>> >>> [compiler-2:08780] mca:base:select:( plm) Query of component [rsh] set 
>>> >>> priority to 10
>>> >>> [compiler-2:08780] mca:base:select:( plm) Querying component [slurm]
>>> >>> [compiler-2:08780] mca:base:select:( plm) Query of component [slurm] 
>>> >>> set priority to 75
>>> >>> [compiler-2:08780] mca:base:select:( plm) Selected component [slurm]
>>> >>> [compiler-2:08780] mca: base: components_register: registering oob 
>>> >>> components
>>> >>> [compiler-2:08780] mca: base: components_register: found loaded 
>>> >>> component tcp
>>> >>> [compiler-2:08780] mca: base: components_register: component tcp 
>>> >>> register function successful
>>> >>> [compiler-2:08780] mca: base: components_open: opening oob components
>>> >>> [compiler-2:08780] mca: base: components_open: found loaded component 
>>> >>> tcp
>>> >>> [compiler-2:08780] mca: base: components_open: component tcp open 
>>> >>> function successful
>>> >>> [compiler-2:08780] mca:oob:select: checking available component tcp
>>> >>> [compiler-2:08780] mca:oob:select: Querying component [tcp]
>>> >>> [compiler-2:08780] oob:tcp: component_available called
>>> >>> [compiler-2:08780] WORKING INTERFACE 1 KERNEL INDEX 1 FAMILY: V4
>>> >>> [compiler-2:08780] WORKING INTERFACE 2 KERNEL INDEX 3 FAMILY: V4
>>> >>> [compiler-2:08780] [[42202,0],0] oob:tcp:init adding 10.0.251.53 to our 
>>> >>> list of V4 connections
>>> >>> [compiler-2:08780] WORKING INTERFACE 3 KERNEL INDEX 4 FAMILY: V4
>>> >>> [compiler-2:08780] [[42202,0],0] oob:tcp:init adding 10.0.0.4 to our 
>>> >>> list of V4 connections
>>> >>> [compiler-2:08780] WORKING INTERFACE 4 KERNEL INDEX 5 FAMILY: V4
>>> >>> [compiler-2:08780] [[42202,0],0] oob:tcp:init adding 10.2.251.14 to our 
>>> >>> list of V4 connections
>>> >>> [compiler-2:08780] WORKING INTERFACE 5 KERNEL INDEX 6 FAMILY: V4
>>> >>> [compiler-2:08780] [[42202,0],0] oob:tcp:init adding 10.128.0.4 to our 
>>> >>> list of V4 connections
>>> >>> [compiler-2:08780] WORKING INTERFACE 6 KERNEL INDEX 7 FAMILY: V4
>>> >>> [compiler-2:08780] [[42202,0],0] oob:tcp:init adding 93.180.7.38 to our 
>>> >>> list of V4 connections
>>> >>> [compiler-2:08780] [[42202,0],0] TCP STARTUP
>>> >>> [compiler-2:08780] [[42202,0],0] attempting to bind to IPv4 port 0
>>> >>> [compiler-2:08780] [[42202,0],0] assigned IPv4 port 38420
>>> >>> [compiler-2:08780] mca:oob:select: Adding component to end
>>> >>> [compiler-2:08780] mca:oob:select: Found 1 active transports
>>> >>> [compiler-2:08780] mca: base: components_register: registering rml 
>>> >>> components
>>> >>> [compiler-2:08780] mca: base: components_register: found loaded 
>>> >>> component oob
>>> >>> [compiler-2:08780] mca: base: components_register: component oob has no 
>>> >>> register or open function
>>> >>> [compiler-2:08780] mca: base: components_open: opening rml components
>>> >>> [compiler-2:08780] mca: base: components_open: found loaded component 
>>> >>> oob
>>> >>> [compiler-2:08780] mca: base: components_open: component oob open 
>>> >>> function successful
>>> >>> [compiler-2:08780] orte_rml_base_select: initializing rml component oob
>>> >>> [compiler-2:08780] [[42202,0],0] posting recv
>>> >>> [compiler-2:08780] [[42202,0],0] posting persistent recv on tag 30 for 
>>> >>> peer [[WILDCARD],WILDCARD]
>>> >>> [compiler-2:08780] [[42202,0],0] posting recv
>>> >>> [compiler-2:08780] [[42202,0],0] posting persistent recv on tag 15 for 
>>> >>> peer [[WILDCARD],WILDCARD]
>>> >>> [compiler-2:08780] [[42202,0],0] posting recv
>>> >>> [compiler-2:08780] [[42202,0],0] posting persistent recv on tag 32 for 
>>> >>> peer [[WILDCARD],WILDCARD]
>>> >>> [compiler-2:08780] [[42202,0],0] posting recv
>>> >>> [compiler-2:08780] [[42202,0],0] posting persistent recv on tag 33 for 
>>> >>> peer [[WILDCARD],WILDCARD]
>>> >>> [compiler-2:08780] [[42202,0],0] posting recv
>>> >>> [compiler-2:08780] [[42202,0],0] posting persistent recv on tag 5 for 
>>> >>> peer [[WILDCARD],WILDCARD]
>>> >>> [compiler-2:08780] [[42202,0],0] posting recv
>>> >>> [compiler-2:08780] [[42202,0],0] posting persistent recv on tag 10 for 
>>> >>> peer [[WILDCARD],WILDCARD]
>>> >>> [compiler-2:08780] [[42202,0],0] posting recv
>>> >>> [compiler-2:08780] [[42202,0],0] posting persistent recv on tag 12 for 
>>> >>> peer [[WILDCARD],WILDCARD]
>>> >>> [compiler-2:08780] [[42202,0],0] posting recv
>>> >>> [compiler-2:08780] [[42202,0],0] posting persistent recv on tag 9 for 
>>> >>> peer [[WILDCARD],WILDCARD]
>>> >>> [compiler-2:08780] [[42202,0],0] posting recv
>>> >>> [compiler-2:08780] [[42202,0],0] posting persistent recv on tag 34 for 
>>> >>> peer [[WILDCARD],WILDCARD]
>>> >>> [compiler-2:08780] [[42202,0],0] posting recv
>>> >>> [compiler-2:08780] [[42202,0],0] posting persistent recv on tag 2 for 
>>> >>> peer [[WILDCARD],WILDCARD]
>>> >>> [compiler-2:08780] [[42202,0],0] posting recv
>>> >>> [compiler-2:08780] [[42202,0],0] posting persistent recv on tag 21 for 
>>> >>> peer [[WILDCARD],WILDCARD]
>>> >>> [compiler-2:08780] [[42202,0],0] posting recv
>>> >>> [compiler-2:08780] [[42202,0],0] posting persistent recv on tag 22 for 
>>> >>> peer [[WILDCARD],WILDCARD]
>>> >>> [compiler-2:08780] [[42202,0],0] posting recv
>>> >>> [compiler-2:08780] [[42202,0],0] posting persistent recv on tag 45 for 
>>> >>> peer [[WILDCARD],WILDCARD]
>>> >>> [compiler-2:08780] [[42202,0],0] posting recv
>>> >>> [compiler-2:08780] [[42202,0],0] posting persistent recv on tag 46 for 
>>> >>> peer [[WILDCARD],WILDCARD]
>>> >>> [compiler-2:08780] [[42202,0],0] posting recv
>>> >>> [compiler-2:08780] [[42202,0],0] posting persistent recv on tag 1 for 
>>> >>> peer [[WILDCARD],WILDCARD]
>>> >>> [compiler-2:08780] [[42202,0],0] posting recv
>>> >>> [compiler-2:08780] [[42202,0],0] posting persistent recv on tag 27 for 
>>> >>> peer [[WILDCARD],WILDCARD]
>>> >>> Daemon was launched on node1-130-08 - beginning to initialize
>>> >>> Daemon was launched on node1-130-03 - beginning to initialize
>>> >>> Daemon was launched on node1-130-05 - beginning to initialize
>>> >>> Daemon was launched on node1-130-02 - beginning to initialize
>>> >>> Daemon was launched on node1-130-01 - beginning to initialize
>>> >>> Daemon was launched on node1-130-04 - beginning to initialize
>>> >>> Daemon was launched on node1-130-07 - beginning to initialize
>>> >>> Daemon was launched on node1-130-06 - beginning to initialize
>>> >>> Daemon [[42202,0],3] checking in as pid 7178 on host node1-130-03
>>> >>> [node1-130-03:07178] [[42202,0],3] orted: up and running - waiting for 
>>> >>> commands!
>>> >>> Daemon [[42202,0],2] checking in as pid 13581 on host node1-130-02
>>> >>> [node1-130-02:13581] [[42202,0],2] orted: up and running - waiting for 
>>> >>> commands!
>>> >>> Daemon [[42202,0],1] checking in as pid 17220 on host node1-130-01
>>> >>> [node1-130-01:17220] [[42202,0],1] orted: up and running - waiting for 
>>> >>> commands!
>>> >>> Daemon [[42202,0],5] checking in as pid 6663 on host node1-130-05
>>> >>> [node1-130-05:06663] [[42202,0],5] orted: up and running - waiting for 
>>> >>> commands!
>>> >>> Daemon [[42202,0],8] checking in as pid 6683 on host node1-130-08
>>> >>> [node1-130-08:06683] [[42202,0],8] orted: up and running - waiting for 
>>> >>> commands!
>>> >>> Daemon [[42202,0],7] checking in as pid 7877 on host node1-130-07
>>> >>> [node1-130-07:07877] [[42202,0],7] orted: up and running - waiting for 
>>> >>> commands!
>>> >>> Daemon [[42202,0],4] checking in as pid 7735 on host node1-130-04
>>> >>> [node1-130-04:07735] [[42202,0],4] orted: up and running - waiting for 
>>> >>> commands!
>>> >>> Daemon [[42202,0],6] checking in as pid 8451 on host node1-130-06
>>> >>> [node1-130-06:08451] [[42202,0],6] orted: up and running - waiting for 
>>> >>> commands!
>>> >>> srun: error: node1-130-03: task 2: Exited with exit code 1
>>> >>> srun: Terminating job step 657040.1
>>> >>> srun: error: node1-130-02: task 1: Exited with exit code 1
>>> >>> slurmd[node1-130-04]: *** STEP 657040.1 KILLED AT 2014-08-12T12:59:07 
>>> >>> WITH SIGNAL 9 ***
>>> >>> slurmd[node1-130-07]: *** STEP 657040.1 KILLED AT 2014-08-12T12:59:07 
>>> >>> WITH SIGNAL 9 ***
>>> >>> slurmd[node1-130-06]: *** STEP 657040.1 KILLED AT 2014-08-12T12:59:07 
>>> >>> WITH SIGNAL 9 ***
>>> >>> srun: Job step aborted: Waiting up to 2 seconds for job step to finish.
>>> >>> srun: error: node1-130-01: task 0: Exited with exit code 1
>>> >>> srun: error: node1-130-05: task 4: Exited with exit code 1
>>> >>> srun: error: node1-130-08: task 7: Exited with exit code 1
>>> >>> srun: error: node1-130-07: task 6: Exited with exit code 1
>>> >>> srun: error: node1-130-04: task 3: Killed
>>> >>> srun: error: node1-130-06: task 5: Killed
>>> >>> --------------------------------------------------------------------------
>>> >>> An ORTE daemon has unexpectedly failed after launch and before
>>> >>> communicating back to mpirun. This could be caused by a number
>>> >>> of factors, including an inability to create a connection back
>>> >>> to mpirun due to a lack of common network interfaces and/or no
>>> >>> route found between them. Please check network connectivity
>>> >>> (including firewalls and network routing requirements).
>>> >>> --------------------------------------------------------------------------
>>> >>> [compiler-2:08780] [[42202,0],0] orted_cmd: received halt_vm cmd
>>> >>> [compiler-2:08780] mca: base: close: component oob closed
>>> >>> [compiler-2:08780] mca: base: close: unloading component oob
>>> >>> [compiler-2:08780] [[42202,0],0] TCP SHUTDOWN
>>> >>> [compiler-2:08780] mca: base: close: component tcp closed
>>> >>> [compiler-2:08780] mca: base: close: unloading component tcp
>>> >>> 
>>> >>> _______________________________________________
>>> >>> users mailing list
>>> >>> us...@open-mpi.org
>>> >>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
>>> >>> Link to this post: 
>>> >>> http://www.open-mpi.org/community/lists/users/2014/08/24987.php
>>> >>> 
>>> >>> 
>>> >>> 
>>> >>> _______________________________________________
>>> >>> users mailing list
>>> >>> us...@open-mpi.org
>>> >>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
>>> >>> Link to this post: 
>>> >>> http://www.open-mpi.org/community/lists/users/2014/08/24988.php
>>> >> 
>>> >> 
>>> >> -- 
>>> >> Jeff Squyres
>>> >> jsquy...@cisco.com
>>> >> For corporate legal information go to: 
>>> >> http://www.cisco.com/web/about/doing_business/legal/cri/
>>> >> 
>>> >> 
>>> >> 
>>> >> 
>>> >> 
>>> > 
>>> > 
>>> > -- 
>>> > Jeff Squyres
>>> > jsquy...@cisco.com
>>> > For corporate legal information go to: 
>>> > http://www.cisco.com/web/about/doing_business/legal/cri/
>>> > 
>>> > _______________________________________________
>>> > users mailing list
>>> > us...@open-mpi.org
>>> > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
>>> > Link to this post: 
>>> > http://www.open-mpi.org/community/lists/users/2014/08/25001.php
>>> 
>>> 
>>> -- 
>>> Jeff Squyres
>>> jsquy...@cisco.com
>>> For corporate legal information go to: 
>>> http://www.cisco.com/web/about/doing_business/legal/cri/
>>> 
>>> 
>>> 
>>> 
>>> 
>>> _______________________________________________
>>> users mailing list
>>> us...@open-mpi.org
>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
>>> Link to this post: 
>>> http://www.open-mpi.org/community/lists/users/2014/08/25086.php
>> 
>> 
>> _______________________________________________
>> users mailing list
>> us...@open-mpi.org
>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
>> Link to this post: 
>> http://www.open-mpi.org/community/lists/users/2014/08/25093.php
>> 
>> _______________________________________________
>> users mailing list
>> us...@open-mpi.org
>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
>> Link to this post: 
>> http://www.open-mpi.org/community/lists/users/2014/08/25094.php
> 
> _______________________________________________
> users mailing list
> us...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
> Link to this post: 
> http://www.open-mpi.org/community/lists/users/2014/08/25095.php
> 
> 
> 
> _______________________________________________
> users mailing list
> us...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
> Link to this post: 
> http://www.open-mpi.org/community/lists/users/2014/08/25105.php

Reply via email to