Hello!

As i can see, the bug is fixed, but in Open MPI v1.9a1r32516  i still have the 
problem

a)
$ mpirun  -np 1 ./hello_c
--------------------------------------------------------------------------
An ORTE daemon has unexpectedly failed after launch and before
communicating back to mpirun. This could be caused by a number
of factors, including an inability to create a connection back
to mpirun due to a lack of common network interfaces and/or no
route found between them. Please check network connectivity
(including firewalls and network routing requirements).
--------------------------------------------------------------------------
b)
$ mpirun --mca oob_tcp_if_include ib0 -np 1 ./hello_c
--------------------------------------------------------------------------
An ORTE daemon has unexpectedly failed after launch and before
communicating back to mpirun. This could be caused by a number
of factors, including an inability to create a connection back
to mpirun due to a lack of common network interfaces and/or no
route found between them. Please check network connectivity
(including firewalls and network routing requirements).
--------------------------------------------------------------------------

c)

$ mpirun --mca oob_tcp_if_include ib0 -debug-daemons --mca plm_base_verbose 5 
-mca oob_base_verbose 10 -mca rml_base_verbose 10 -np 1 ./hello_c
[compiler-2:14673] mca:base:select:( plm) Querying component [isolated]
[compiler-2:14673] mca:base:select:( plm) Query of component [isolated] set 
priority to 0
[compiler-2:14673] mca:base:select:( plm) Querying component [rsh]
[compiler-2:14673] mca:base:select:( plm) Query of component [rsh] set priority 
to 10
[compiler-2:14673] mca:base:select:( plm) Querying component [slurm]
[compiler-2:14673] mca:base:select:( plm) Query of component [slurm] set 
priority to 75
[compiler-2:14673] mca:base:select:( plm) Selected component [slurm]
[compiler-2:14673] mca: base: components_register: registering oob components
[compiler-2:14673] mca: base: components_register: found loaded component tcp
[compiler-2:14673] mca: base: components_register: component tcp register 
function successful
[compiler-2:14673] mca: base: components_open: opening oob components
[compiler-2:14673] mca: base: components_open: found loaded component tcp
[compiler-2:14673] mca: base: components_open: component tcp open function 
successful
[compiler-2:14673] mca:oob:select: checking available component tcp
[compiler-2:14673] mca:oob:select: Querying component [tcp]
[compiler-2:14673] oob:tcp: component_available called
[compiler-2:14673] WORKING INTERFACE 1 KERNEL INDEX 1 FAMILY: V4
[compiler-2:14673] WORKING INTERFACE 2 KERNEL INDEX 3 FAMILY: V4
[compiler-2:14673] WORKING INTERFACE 3 KERNEL INDEX 4 FAMILY: V4
[compiler-2:14673] WORKING INTERFACE 4 KERNEL INDEX 5 FAMILY: V4
[compiler-2:14673] WORKING INTERFACE 5 KERNEL INDEX 6 FAMILY: V4
[compiler-2:14673] [[49095,0],0] oob:tcp:init adding 10.128.0.4 to our list of 
V4 connections
[compiler-2:14673] WORKING INTERFACE 6 KERNEL INDEX 7 FAMILY: V4
[compiler-2:14673] [[49095,0],0] TCP STARTUP
[compiler-2:14673] [[49095,0],0] attempting to bind to IPv4 port 0
[compiler-2:14673] [[49095,0],0] assigned IPv4 port 59460
[compiler-2:14673] mca:oob:select: Adding component to end
[compiler-2:14673] mca:oob:select: Found 1 active transports
[compiler-2:14673] mca: base: components_register: registering rml components
[compiler-2:14673] mca: base: components_register: found loaded component oob
[compiler-2:14673] mca: base: components_register: component oob has no 
register or open function
[compiler-2:14673] mca: base: components_open: opening rml components
[compiler-2:14673] mca: base: components_open: found loaded component oob
[compiler-2:14673] mca: base: components_open: component oob open function 
successful
[compiler-2:14673] orte_rml_base_select: initializing rml component oob
[compiler-2:14673] [[49095,0],0] posting recv
[compiler-2:14673] [[49095,0],0] posting persistent recv on tag 30 for peer 
[[WILDCARD],WILDCARD]
[compiler-2:14673] [[49095,0],0] posting recv
[compiler-2:14673] [[49095,0],0] posting persistent recv on tag 15 for peer 
[[WILDCARD],WILDCARD]
[compiler-2:14673] [[49095,0],0] posting recv
[compiler-2:14673] [[49095,0],0] posting persistent recv on tag 32 for peer 
[[WILDCARD],WILDCARD]
[compiler-2:14673] [[49095,0],0] posting recv
[compiler-2:14673] [[49095,0],0] posting persistent recv on tag 33 for peer 
[[WILDCARD],WILDCARD]
[compiler-2:14673] [[49095,0],0] posting recv
[compiler-2:14673] [[49095,0],0] posting persistent recv on tag 5 for peer 
[[WILDCARD],WILDCARD]
[compiler-2:14673] [[49095,0],0] posting recv
[compiler-2:14673] [[49095,0],0] posting persistent recv on tag 10 for peer 
[[WILDCARD],WILDCARD]
[compiler-2:14673] [[49095,0],0] posting recv
[compiler-2:14673] [[49095,0],0] posting persistent recv on tag 12 for peer 
[[WILDCARD],WILDCARD]
[compiler-2:14673] [[49095,0],0] posting recv
[compiler-2:14673] [[49095,0],0] posting persistent recv on tag 9 for peer 
[[WILDCARD],WILDCARD]
[compiler-2:14673] [[49095,0],0] posting recv
[compiler-2:14673] [[49095,0],0] posting persistent recv on tag 34 for peer 
[[WILDCARD],WILDCARD]
[compiler-2:14673] [[49095,0],0] posting recv
[compiler-2:14673] [[49095,0],0] posting persistent recv on tag 2 for peer 
[[WILDCARD],WILDCARD]
[compiler-2:14673] [[49095,0],0] posting recv
[compiler-2:14673] [[49095,0],0] posting persistent recv on tag 21 for peer 
[[WILDCARD],WILDCARD]
[compiler-2:14673] [[49095,0],0] posting recv
[compiler-2:14673] [[49095,0],0] posting persistent recv on tag 22 for peer 
[[WILDCARD],WILDCARD]
[compiler-2:14673] [[49095,0],0] posting recv
[compiler-2:14673] [[49095,0],0] posting persistent recv on tag 45 for peer 
[[WILDCARD],WILDCARD]
[compiler-2:14673] [[49095,0],0] posting recv
[compiler-2:14673] [[49095,0],0] posting persistent recv on tag 46 for peer 
[[WILDCARD],WILDCARD]
[compiler-2:14673] [[49095,0],0] posting recv
[compiler-2:14673] [[49095,0],0] posting persistent recv on tag 1 for peer 
[[WILDCARD],WILDCARD]
[compiler-2:14673] [[49095,0],0] posting recv
[compiler-2:14673] [[49095,0],0] posting persistent recv on tag 27 for peer 
[[WILDCARD],WILDCARD]
Daemon was launched on node1-128-01 - beginning to initialize
--------------------------------------------------------------------------
WARNING: An invalid value was given for oob_tcp_if_include. This
value will be ignored.
Local host: node1-128-01
Value: "ib0"
Message: Invalid specification (missing "/")
--------------------------------------------------------------------------
--------------------------------------------------------------------------
None of the TCP networks specified to be included for out-of-band communications
could be found:
Value given:
Please revise the specification and try again.
--------------------------------------------------------------------------
--------------------------------------------------------------------------
No network interfaces were found for out-of-band communications. We require
at least one available network for out-of-band messaging.
--------------------------------------------------------------------------
--------------------------------------------------------------------------
It looks like orte_init failed for some reason; your parallel process is
likely to abort. There are many reasons that a parallel process can
fail during orte_init; some of which are due to configuration or
environment problems. This failure appears to be an internal failure;
here's some additional information (which may only be relevant to an
Open MPI developer):
orte_oob_base_select failed
--> Returned value (null) (-43) instead of ORTE_SUCCESS
--------------------------------------------------------------------------
srun: error: node1-128-01: task 0: Exited with exit code 213
srun: Terminating job step 661215.0
--------------------------------------------------------------------------
An ORTE daemon has unexpectedly failed after launch and before
communicating back to mpirun. This could be caused by a number
of factors, including an inability to create a connection back
to mpirun due to a lack of common network interfaces and/or no
route found between them. Please check network connectivity
(including firewalls and network routing requirements).
--------------------------------------------------------------------------
[compiler-2:14673] [[49095,0],0] orted_cmd: received halt_vm cmd
[compiler-2:14673] mca: base: close: component oob closed
[compiler-2:14673] mca: base: close: unloading component oob
[compiler-2:14673] [[49095,0],0] TCP SHUTDOWN
[compiler-2:14673] mca: base: close: component tcp closed
[compiler-2:14673] mca: base: close: unloading component tcp


Tue, 12 Aug 2014 18:33:24 +0000 от "Jeff Squyres (jsquyres)" 
<jsquy...@cisco.com>:
>I filed the following ticket:
>
>     https://svn.open-mpi.org/trac/ompi/ticket/4857
>
>
>On Aug 12, 2014, at 12:39 PM, Jeff Squyres (jsquyres) < jsquy...@cisco.com > 
>wrote:
>
>> (please keep the users list CC'ed)
>> 
>> We talked about this on the weekly engineering call today.  Ralph has an 
>> idea what is happening -- I need to do a little investigation today and file 
>> a bug.  I'll make sure you're CC'ed on the bug ticket.
>> 
>> 
>> 
>> On Aug 12, 2014, at 12:27 PM, Timur Ismagilov < tismagi...@mail.ru > wrote:
>> 
>>> I don't have this error in OMPI 1.9a1r32252 and OMPI 1.8.1 (with --mca 
>>> oob_tcp_if_include ib0), but in all latest night snapshots i got this error.
>>> 
>>> 
>>> Tue, 12 Aug 2014 13:08:12 +0000 от "Jeff Squyres (jsquyres)" < 
>>> jsquy...@cisco.com >:
>>> Are you running any kind of firewall on the node where mpirun is invoked? 
>>> Open MPI needs to be able to use arbitrary TCP ports between the servers on 
>>> which it runs.
>>> 
>>> This second mail seems to imply a bug in OMPI's oob_tcp_if_include param 
>>> handling, however -- it's supposed to be able to handle an interface name 
>>> (not just a network specification).
>>> 
>>> Ralph -- can you have a look?
>>> 
>>> 
>>> On Aug 12, 2014, at 8:41 AM, Timur Ismagilov < tismagi...@mail.ru > wrote:
>>> 
>>>> When i add --mca oob_tcp_if_include ib0 (infiniband interface) to mpirun 
>>>> (as it was here:  
>>>> http://www.open-mpi.org/community/lists/users/2014/07/24857.php ) i got 
>>>> this output:
>>>> 
>>>> [compiler-2:08792] mca:base:select:( plm) Querying component [isolated]
>>>> [compiler-2:08792] mca:base:select:( plm) Query of component [isolated] 
>>>> set priority to 0
>>>> [compiler-2:08792] mca:base:select:( plm) Querying component [rsh]
>>>> [compiler-2:08792] mca:base:select:( plm) Query of component [rsh] set 
>>>> priority to 10
>>>> [compiler-2:08792] mca:base:select:( plm) Querying component [slurm]
>>>> [compiler-2:08792] mca:base:select:( plm) Query of component [slurm] set 
>>>> priority to 75
>>>> [compiler-2:08792] mca:base:select:( plm) Selected component [slurm]
>>>> [compiler-2:08792] mca: base: components_register: registering oob 
>>>> components
>>>> [compiler-2:08792] mca: base: components_register: found loaded component 
>>>> tcp
>>>> [compiler-2:08792] mca: base: components_register: component tcp register 
>>>> function successful
>>>> [compiler-2:08792] mca: base: components_open: opening oob components
>>>> [compiler-2:08792] mca: base: components_open: found loaded component tcp
>>>> [compiler-2:08792] mca: base: components_open: component tcp open function 
>>>> successful
>>>> [compiler-2:08792] mca:oob:select: checking available component tcp
>>>> [compiler-2:08792] mca:oob:select: Querying component [tcp]
>>>> [compiler-2:08792] oob:tcp: component_available called
>>>> [compiler-2:08792] WORKING INTERFACE 1 KERNEL INDEX 1 FAMILY: V4
>>>> [compiler-2:08792] WORKING INTERFACE 2 KERNEL INDEX 3 FAMILY: V4
>>>> [compiler-2:08792] WORKING INTERFACE 3 KERNEL INDEX 4 FAMILY: V4
>>>> [compiler-2:08792] WORKING INTERFACE 4 KERNEL INDEX 5 FAMILY: V4
>>>> [compiler-2:08792] WORKING INTERFACE 5 KERNEL INDEX 6 FAMILY: V4
>>>> [compiler-2:08792] [[42190,0],0] oob:tcp:init adding 10.128.0.4 to our 
>>>> list of V4 connections
>>>> [compiler-2:08792] WORKING INTERFACE 6 KERNEL INDEX 7 FAMILY: V4
>>>> [compiler-2:08792] [[42190,0],0] TCP STARTUP
>>>> [compiler-2:08792] [[42190,0],0] attempting to bind to IPv4 port 0
>>>> [compiler-2:08792] [[42190,0],0] assigned IPv4 port 53883
>>>> [compiler-2:08792] mca:oob:select: Adding component to end
>>>> [compiler-2:08792] mca:oob:select: Found 1 active transports
>>>> [compiler-2:08792] mca: base: components_register: registering rml 
>>>> components
>>>> [compiler-2:08792] mca: base: components_register: found loaded component 
>>>> oob
>>>> [compiler-2:08792] mca: base: components_register: component oob has no 
>>>> register or open function
>>>> [compiler-2:08792] mca: base: components_open: opening rml components
>>>> [compiler-2:08792] mca: base: components_open: found loaded component oob
>>>> [compiler-2:08792] mca: base: components_open: component oob open function 
>>>> successful
>>>> [compiler-2:08792] orte_rml_base_select: initializing rml component oob
>>>> [compiler-2:08792] [[42190,0],0] posting recv
>>>> [compiler-2:08792] [[42190,0],0] posting persistent recv on tag 30 for 
>>>> peer [[WILDCARD],WILDCARD]
>>>> [compiler-2:08792] [[42190,0],0] posting recv
>>>> [compiler-2:08792] [[42190,0],0] posting persistent recv on tag 15 for 
>>>> peer [[WILDCARD],WILDCARD]
>>>> [compiler-2:08792] [[42190,0],0] posting recv
>>>> [compiler-2:08792] [[42190,0],0] posting persistent recv on tag 32 for 
>>>> peer [[WILDCARD],WILDCARD]
>>>> [compiler-2:08792] [[42190,0],0] posting recv
>>>> [compiler-2:08792] [[42190,0],0] posting persistent recv on tag 33 for 
>>>> peer [[WILDCARD],WILDCARD]
>>>> [compiler-2:08792] [[42190,0],0] posting recv
>>>> [compiler-2:08792] [[42190,0],0] posting persistent recv on tag 5 for peer 
>>>> [[WILDCARD],WILDCARD]
>>>> [compiler-2:08792] [[42190,0],0] posting recv
>>>> [compiler-2:08792] [[42190,0],0] posting persistent recv on tag 10 for 
>>>> peer [[WILDCARD],WILDCARD]
>>>> [compiler-2:08792] [[42190,0],0] posting recv
>>>> [compiler-2:08792] [[42190,0],0] posting persistent recv on tag 12 for 
>>>> peer [[WILDCARD],WILDCARD]
>>>> [compiler-2:08792] [[42190,0],0] posting recv
>>>> [compiler-2:08792] [[42190,0],0] posting persistent recv on tag 9 for peer 
>>>> [[WILDCARD],WILDCARD]
>>>> [compiler-2:08792] [[42190,0],0] posting recv
>>>> [compiler-2:08792] [[42190,0],0] posting persistent recv on tag 34 for 
>>>> peer [[WILDCARD],WILDCARD]
>>>> [compiler-2:08792] [[42190,0],0] posting recv
>>>> [compiler-2:08792] [[42190,0],0] posting persistent recv on tag 2 for peer 
>>>> [[WILDCARD],WILDCARD]
>>>> [compiler-2:08792] [[42190,0],0] posting recv
>>>> [compiler-2:08792] [[42190,0],0] posting persistent recv on tag 21 for 
>>>> peer [[WILDCARD],WILDCARD]
>>>> [compiler-2:08792] [[42190,0],0] posting recv
>>>> [compiler-2:08792] [[42190,0],0] posting persistent recv on tag 22 for 
>>>> peer [[WILDCARD],WILDCARD]
>>>> [compiler-2:08792] [[42190,0],0] posting recv
>>>> [compiler-2:08792] [[42190,0],0] posting persistent recv on tag 45 for 
>>>> peer [[WILDCARD],WILDCARD]
>>>> [compiler-2:08792] [[42190,0],0] posting recv
>>>> [compiler-2:08792] [[42190,0],0] posting persistent recv on tag 46 for 
>>>> peer [[WILDCARD],WILDCARD]
>>>> [compiler-2:08792] [[42190,0],0] posting recv
>>>> [compiler-2:08792] [[42190,0],0] posting persistent recv on tag 1 for peer 
>>>> [[WILDCARD],WILDCARD]
>>>> [compiler-2:08792] [[42190,0],0] posting recv
>>>> [compiler-2:08792] [[42190,0],0] posting persistent recv on tag 27 for 
>>>> peer [[WILDCARD],WILDCARD]
>>>> Daemon was launched on node1-128-01 - beginning to initialize
>>>> Daemon was launched on node1-128-02 - beginning to initialize
>>>> --------------------------------------------------------------------------
>>>> WARNING: An invalid value was given for oob_tcp_if_include. This
>>>> value will be ignored.
>>>> 
>>>> Local host: node1-128-01
>>>> Value: "ib0"
>>>> Message: Invalid specification (missing "/")
>>>> --------------------------------------------------------------------------
>>>> --------------------------------------------------------------------------
>>>> WARNING: An invalid value was given for oob_tcp_if_include. This
>>>> value will be ignored.
>>>> 
>>>> Local host: node1-128-02
>>>> Value: "ib0"
>>>> Message: Invalid specification (missing "/")
>>>> --------------------------------------------------------------------------
>>>> --------------------------------------------------------------------------
>>>> None of the TCP networks specified to be included for out-of-band 
>>>> communications
>>>> could be found:
>>>> 
>>>> Value given:
>>>> 
>>>> Please revise the specification and try again.
>>>> --------------------------------------------------------------------------
>>>> --------------------------------------------------------------------------
>>>> None of the TCP networks specified to be included for out-of-band 
>>>> communications
>>>> could be found:
>>>> 
>>>> Value given:
>>>> 
>>>> Please revise the specification and try again.
>>>> --------------------------------------------------------------------------
>>>> --------------------------------------------------------------------------
>>>> No network interfaces were found for out-of-band communications. We require
>>>> at least one available network for out-of-band messaging.
>>>> --------------------------------------------------------------------------
>>>> --------------------------------------------------------------------------
>>>> No network interfaces were found for out-of-band communications. We require
>>>> at least one available network for out-of-band messaging.
>>>> --------------------------------------------------------------------------
>>>> --------------------------------------------------------------------------
>>>> It looks like orte_init failed for some reason; your parallel process is
>>>> likely to abort. There are many reasons that a parallel process can
>>>> fail during orte_init; some of which are due to configuration or
>>>> environment problems. This failure appears to be an internal failure;
>>>> here's some additional information (which may only be relevant to an
>>>> Open MPI developer):
>>>> 
>>>> orte_oob_base_select failed
>>>> --> Returned value (null) (-43) instead of ORTE_SUCCESS
>>>> --------------------------------------------------------------------------
>>>> --------------------------------------------------------------------------
>>>> It looks like orte_init failed for some reason; your parallel process is
>>>> likely to abort. There are many reasons that a parallel process can
>>>> fail during orte_init; some of which are due to configuration or
>>>> environment problems. This failure appears to be an internal failure;
>>>> here's some additional information (which may only be relevant to an
>>>> Open MPI developer):
>>>> 
>>>> orte_oob_base_select failed
>>>> --> Returned value (null) (-43) instead of ORTE_SUCCESS
>>>> --------------------------------------------------------------------------
>>>> srun: error: node1-128-02: task 1: Exited with exit code 213
>>>> srun: Terminating job step 657300.0
>>>> srun: error: node1-128-01: task 0: Exited with exit code 213
>>>> --------------------------------------------------------------------------
>>>> An ORTE daemon has unexpectedly failed after launch and before
>>>> communicating back to mpirun. This could be caused by a number
>>>> of factors, including an inability to create a connection back
>>>> to mpirun due to a lack of common network interfaces and/or no
>>>> route found between them. Please check network connectivity
>>>> (including firewalls and network routing requirements).
>>>> --------------------------------------------------------------------------
>>>> [compiler-2:08792] [[42190,0],0] orted_cmd: received halt_vm cmd
>>>> [compiler-2:08792] mca: base: close: component oob closed
>>>> [compiler-2:08792] mca: base: close: unloading component oob
>>>> [compiler-2:08792] [[42190,0],0] TCP SHUTDOWN
>>>> [compiler-2:08792] mca: base: close: component tcp closed
>>>> [compiler-2:08792] mca: base: close: unloading component tcp
>>>> 
>>>> 
>>>> 
>>>> Tue, 12 Aug 2014 16:14:58 +0400 от Timur Ismagilov < tismagi...@mail.ru >:
>>>> Hello!
>>>> 
>>>> I have Open MPI v1.8.2rc4r32485
>>>> 
>>>> When i run hello_c, I got this error message
>>>> $mpirun -np 2 hello_c
>>>> 
>>>> An ORTE daemon has unexpectedly failed after launch and before
>>>> 
>>>> communicating back to mpirun. This could be caused by a number
>>>> of factors, including an inability to create a connection back
>>>> to mpirun due to a lack of common network interfaces and/or no
>>>> route found between them. Please check network connectivity
>>>> (including firewalls and network routing requirements).
>>>> 
>>>> When i run with --debug-daemons --mca plm_base_verbose 5 -mca 
>>>> oob_base_verbose 10 -mca rml_base_verbose 10 i got this output:
>>>> $mpirun --debug-daemons --mca plm_base_verbose 5 -mca oob_base_verbose 10 
>>>> -mca rml_base_verbose 10 -np 2 hello_c
>>>> 
>>>> [compiler-2:08780] mca:base:select:( plm) Querying component [isolated]
>>>> [compiler-2:08780] mca:base:select:( plm) Query of component [isolated] 
>>>> set priority to 0
>>>> [compiler-2:08780] mca:base:select:( plm) Querying component [rsh]
>>>> [compiler-2:08780] mca:base:select:( plm) Query of component [rsh] set 
>>>> priority to 10
>>>> [compiler-2:08780] mca:base:select:( plm) Querying component [slurm]
>>>> [compiler-2:08780] mca:base:select:( plm) Query of component [slurm] set 
>>>> priority to 75
>>>> [compiler-2:08780] mca:base:select:( plm) Selected component [slurm]
>>>> [compiler-2:08780] mca: base: components_register: registering oob 
>>>> components
>>>> [compiler-2:08780] mca: base: components_register: found loaded component 
>>>> tcp
>>>> [compiler-2:08780] mca: base: components_register: component tcp register 
>>>> function successful
>>>> [compiler-2:08780] mca: base: components_open: opening oob components
>>>> [compiler-2:08780] mca: base: components_open: found loaded component tcp
>>>> [compiler-2:08780] mca: base: components_open: component tcp open function 
>>>> successful
>>>> [compiler-2:08780] mca:oob:select: checking available component tcp
>>>> [compiler-2:08780] mca:oob:select: Querying component [tcp]
>>>> [compiler-2:08780] oob:tcp: component_available called
>>>> [compiler-2:08780] WORKING INTERFACE 1 KERNEL INDEX 1 FAMILY: V4
>>>> [compiler-2:08780] WORKING INTERFACE 2 KERNEL INDEX 3 FAMILY: V4
>>>> [compiler-2:08780] [[42202,0],0] oob:tcp:init adding 10.0.251.53 to our 
>>>> list of V4 connections
>>>> [compiler-2:08780] WORKING INTERFACE 3 KERNEL INDEX 4 FAMILY: V4
>>>> [compiler-2:08780] [[42202,0],0] oob:tcp:init adding 10.0.0.4 to our list 
>>>> of V4 connections
>>>> [compiler-2:08780] WORKING INTERFACE 4 KERNEL INDEX 5 FAMILY: V4
>>>> [compiler-2:08780] [[42202,0],0] oob:tcp:init adding 10.2.251.14 to our 
>>>> list of V4 connections
>>>> [compiler-2:08780] WORKING INTERFACE 5 KERNEL INDEX 6 FAMILY: V4
>>>> [compiler-2:08780] [[42202,0],0] oob:tcp:init adding 10.128.0.4 to our 
>>>> list of V4 connections
>>>> [compiler-2:08780] WORKING INTERFACE 6 KERNEL INDEX 7 FAMILY: V4
>>>> [compiler-2:08780] [[42202,0],0] oob:tcp:init adding 93.180.7.38 to our 
>>>> list of V4 connections
>>>> [compiler-2:08780] [[42202,0],0] TCP STARTUP
>>>> [compiler-2:08780] [[42202,0],0] attempting to bind to IPv4 port 0
>>>> [compiler-2:08780] [[42202,0],0] assigned IPv4 port 38420
>>>> [compiler-2:08780] mca:oob:select: Adding component to end
>>>> [compiler-2:08780] mca:oob:select: Found 1 active transports
>>>> [compiler-2:08780] mca: base: components_register: registering rml 
>>>> components
>>>> [compiler-2:08780] mca: base: components_register: found loaded component 
>>>> oob
>>>> [compiler-2:08780] mca: base: components_register: component oob has no 
>>>> register or open function
>>>> [compiler-2:08780] mca: base: components_open: opening rml components
>>>> [compiler-2:08780] mca: base: components_open: found loaded component oob
>>>> [compiler-2:08780] mca: base: components_open: component oob open function 
>>>> successful
>>>> [compiler-2:08780] orte_rml_base_select: initializing rml component oob
>>>> [compiler-2:08780] [[42202,0],0] posting recv
>>>> [compiler-2:08780] [[42202,0],0] posting persistent recv on tag 30 for 
>>>> peer [[WILDCARD],WILDCARD]
>>>> [compiler-2:08780] [[42202,0],0] posting recv
>>>> [compiler-2:08780] [[42202,0],0] posting persistent recv on tag 15 for 
>>>> peer [[WILDCARD],WILDCARD]
>>>> [compiler-2:08780] [[42202,0],0] posting recv
>>>> [compiler-2:08780] [[42202,0],0] posting persistent recv on tag 32 for 
>>>> peer [[WILDCARD],WILDCARD]
>>>> [compiler-2:08780] [[42202,0],0] posting recv
>>>> [compiler-2:08780] [[42202,0],0] posting persistent recv on tag 33 for 
>>>> peer [[WILDCARD],WILDCARD]
>>>> [compiler-2:08780] [[42202,0],0] posting recv
>>>> [compiler-2:08780] [[42202,0],0] posting persistent recv on tag 5 for peer 
>>>> [[WILDCARD],WILDCARD]
>>>> [compiler-2:08780] [[42202,0],0] posting recv
>>>> [compiler-2:08780] [[42202,0],0] posting persistent recv on tag 10 for 
>>>> peer [[WILDCARD],WILDCARD]
>>>> [compiler-2:08780] [[42202,0],0] posting recv
>>>> [compiler-2:08780] [[42202,0],0] posting persistent recv on tag 12 for 
>>>> peer [[WILDCARD],WILDCARD]
>>>> [compiler-2:08780] [[42202,0],0] posting recv
>>>> [compiler-2:08780] [[42202,0],0] posting persistent recv on tag 9 for peer 
>>>> [[WILDCARD],WILDCARD]
>>>> [compiler-2:08780] [[42202,0],0] posting recv
>>>> [compiler-2:08780] [[42202,0],0] posting persistent recv on tag 34 for 
>>>> peer [[WILDCARD],WILDCARD]
>>>> [compiler-2:08780] [[42202,0],0] posting recv
>>>> [compiler-2:08780] [[42202,0],0] posting persistent recv on tag 2 for peer 
>>>> [[WILDCARD],WILDCARD]
>>>> [compiler-2:08780] [[42202,0],0] posting recv
>>>> [compiler-2:08780] [[42202,0],0] posting persistent recv on tag 21 for 
>>>> peer [[WILDCARD],WILDCARD]
>>>> [compiler-2:08780] [[42202,0],0] posting recv
>>>> [compiler-2:08780] [[42202,0],0] posting persistent recv on tag 22 for 
>>>> peer [[WILDCARD],WILDCARD]
>>>> [compiler-2:08780] [[42202,0],0] posting recv
>>>> [compiler-2:08780] [[42202,0],0] posting persistent recv on tag 45 for 
>>>> peer [[WILDCARD],WILDCARD]
>>>> [compiler-2:08780] [[42202,0],0] posting recv
>>>> [compiler-2:08780] [[42202,0],0] posting persistent recv on tag 46 for 
>>>> peer [[WILDCARD],WILDCARD]
>>>> [compiler-2:08780] [[42202,0],0] posting recv
>>>> [compiler-2:08780] [[42202,0],0] posting persistent recv on tag 1 for peer 
>>>> [[WILDCARD],WILDCARD]
>>>> [compiler-2:08780] [[42202,0],0] posting recv
>>>> [compiler-2:08780] [[42202,0],0] posting persistent recv on tag 27 for 
>>>> peer [[WILDCARD],WILDCARD]
>>>> Daemon was launched on node1-130-08 - beginning to initialize
>>>> Daemon was launched on node1-130-03 - beginning to initialize
>>>> Daemon was launched on node1-130-05 - beginning to initialize
>>>> Daemon was launched on node1-130-02 - beginning to initialize
>>>> Daemon was launched on node1-130-01 - beginning to initialize
>>>> Daemon was launched on node1-130-04 - beginning to initialize
>>>> Daemon was launched on node1-130-07 - beginning to initialize
>>>> Daemon was launched on node1-130-06 - beginning to initialize
>>>> Daemon [[42202,0],3] checking in as pid 7178 on host node1-130-03
>>>> [node1-130-03:07178] [[42202,0],3] orted: up and running - waiting for 
>>>> commands!
>>>> Daemon [[42202,0],2] checking in as pid 13581 on host node1-130-02
>>>> [node1-130-02:13581] [[42202,0],2] orted: up and running - waiting for 
>>>> commands!
>>>> Daemon [[42202,0],1] checking in as pid 17220 on host node1-130-01
>>>> [node1-130-01:17220] [[42202,0],1] orted: up and running - waiting for 
>>>> commands!
>>>> Daemon [[42202,0],5] checking in as pid 6663 on host node1-130-05
>>>> [node1-130-05:06663] [[42202,0],5] orted: up and running - waiting for 
>>>> commands!
>>>> Daemon [[42202,0],8] checking in as pid 6683 on host node1-130-08
>>>> [node1-130-08:06683] [[42202,0],8] orted: up and running - waiting for 
>>>> commands!
>>>> Daemon [[42202,0],7] checking in as pid 7877 on host node1-130-07
>>>> [node1-130-07:07877] [[42202,0],7] orted: up and running - waiting for 
>>>> commands!
>>>> Daemon [[42202,0],4] checking in as pid 7735 on host node1-130-04
>>>> [node1-130-04:07735] [[42202,0],4] orted: up and running - waiting for 
>>>> commands!
>>>> Daemon [[42202,0],6] checking in as pid 8451 on host node1-130-06
>>>> [node1-130-06:08451] [[42202,0],6] orted: up and running - waiting for 
>>>> commands!
>>>> srun: error: node1-130-03: task 2: Exited with exit code 1
>>>> srun: Terminating job step 657040.1
>>>> srun: error: node1-130-02: task 1: Exited with exit code 1
>>>> slurmd[node1-130-04]: *** STEP 657040.1 KILLED AT 2014-08-12T12:59:07 WITH 
>>>> SIGNAL 9 ***
>>>> slurmd[node1-130-07]: *** STEP 657040.1 KILLED AT 2014-08-12T12:59:07 WITH 
>>>> SIGNAL 9 ***
>>>> slurmd[node1-130-06]: *** STEP 657040.1 KILLED AT 2014-08-12T12:59:07 WITH 
>>>> SIGNAL 9 ***
>>>> srun: Job step aborted: Waiting up to 2 seconds for job step to finish.
>>>> srun: error: node1-130-01: task 0: Exited with exit code 1
>>>> srun: error: node1-130-05: task 4: Exited with exit code 1
>>>> srun: error: node1-130-08: task 7: Exited with exit code 1
>>>> srun: error: node1-130-07: task 6: Exited with exit code 1
>>>> srun: error: node1-130-04: task 3: Killed
>>>> srun: error: node1-130-06: task 5: Killed
>>>> --------------------------------------------------------------------------
>>>> An ORTE daemon has unexpectedly failed after launch and before
>>>> communicating back to mpirun. This could be caused by a number
>>>> of factors, including an inability to create a connection back
>>>> to mpirun due to a lack of common network interfaces and/or no
>>>> route found between them. Please check network connectivity
>>>> (including firewalls and network routing requirements).
>>>> --------------------------------------------------------------------------
>>>> [compiler-2:08780] [[42202,0],0] orted_cmd: received halt_vm cmd
>>>> [compiler-2:08780] mca: base: close: component oob closed
>>>> [compiler-2:08780] mca: base: close: unloading component oob
>>>> [compiler-2:08780] [[42202,0],0] TCP SHUTDOWN
>>>> [compiler-2:08780] mca: base: close: component tcp closed
>>>> [compiler-2:08780] mca: base: close: unloading component tcp
>>>> 
>>>> _______________________________________________
>>>> users mailing list
>>>>  us...@open-mpi.org
>>>> Subscription:  http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>> Link to this post:  
>>>> http://www.open-mpi.org/community/lists/users/2014/08/24987.php
>>>> 
>>>> 
>>>> 
>>>> _______________________________________________
>>>> users mailing list
>>>>  us...@open-mpi.org
>>>> Subscription:  http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>> Link to this post:  
>>>> http://www.open-mpi.org/community/lists/users/2014/08/24988.php
>>> 
>>> 
>>> -- 
>>> Jeff Squyres
>>>  jsquy...@cisco.com
>>> For corporate legal information go to:  
>>> http://www.cisco.com/web/about/doing_business/legal/cri/
>>> 
>>> 
>>> 
>>> 
>>> 
>> 
>> 
>> -- 
>> Jeff Squyres
>>  jsquy...@cisco.com
>> For corporate legal information go to:  
>> http://www.cisco.com/web/about/doing_business/legal/cri/
>> 
>> _______________________________________________
>> users mailing list
>>  us...@open-mpi.org
>> Subscription:  http://www.open-mpi.org/mailman/listinfo.cgi/users
>> Link to this post:  
>> http://www.open-mpi.org/community/lists/users/2014/08/25001.php
>
>
>-- 
>Jeff Squyres
>jsquy...@cisco.com
>For corporate legal information go to:  
>http://www.cisco.com/web/about/doing_business/legal/cri/
>
>




Reply via email to