When i try to specify oob with --mca oob_tcp_if_include <one of interface from 
ifconfig>, i alwase get error:
$ mpirun  --mca oob_tcp_if_include ib0 -np 1 ./hello_c
--------------------------------------------------------------------------
An ORTE daemon has unexpectedly failed after launch and before
communicating back to mpirun. This could be caused by a number
of factors, including an inability to create a connection back
to mpirun due to a lack of common network interfaces and/or no
route found between them. Please check network connectivity
(including firewalls and network routing requirements).
-------------------------------------------------------------------------

Earlier, in ompi 1.8.1, I can not run mpi jobs without " --mca 
oob_tcp_if_include ib0 "... but now(ompi 1.9.a1) with this flag i get above 
error.

Here is an output of ifconfig
$ ifconfig
eth1 Link encap:Ethernet HWaddr 00:15:17:EE:89:E1 
inet addr:10.0.251.53 Bcast:10.0.251.255 Mask:255.255.255.0
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets:215087433 errors:0 dropped:0 overruns:0 frame:0
TX packets:2648 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000 
RX bytes:26925754883 (25.0 GiB) TX bytes:137971 (134.7 KiB)
Memory:b2c00000-b2c20000
eth2 Link encap:Ethernet HWaddr 00:02:C9:04:73:F8 
inet addr:10.0.0.4 Bcast:10.0.0.255 Mask:255.255.255.0
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets:4892833125 errors:0 dropped:0 overruns:0 frame:0
TX packets:8708606918 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000 
RX bytes:1823986502132 (1.6 TiB) TX bytes:11957754120037 (10.8 TiB)
eth2.911 Link encap:Ethernet HWaddr 00:02:C9:04:73:F8 
inet addr:93.180.7.38 Bcast:93.180.7.63 Mask:255.255.255.224
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets:3746454225 errors:0 dropped:0 overruns:0 frame:0
TX packets:1131917608 errors:0 dropped:3 overruns:0 carrier:0
collisions:0 txqueuelen:0 
RX bytes:285174723322 (265.5 GiB) TX bytes:11523163526058 (10.4 TiB)
eth3 Link encap:Ethernet HWaddr 00:02:C9:04:73:F9 
inet addr:10.2.251.14 Bcast:10.2.251.255 Mask:255.255.255.0
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets:591156692 errors:0 dropped:56 overruns:56 frame:56
TX packets:679729229 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000 
RX bytes:324195989293 (301.9 GiB) TX bytes:770299202886 (717.3 GiB)
Ifconfig uses the ioctl access method to get the full address information, 
which limits hardware addresses to 8 bytes.
Because Infiniband address has 20 bytes, only the first 8 bytes are displayed 
correctly.
Ifconfig is obsolete! For replacement check ip.
ib0 Link encap:InfiniBand HWaddr 
80:00:00:48:FE:80:00:00:00:00:00:00:00:00:00:00:00:00:00:00 
inet addr:10.128.0.4 Bcast:10.128.255.255 Mask:255.255.0.0
UP BROADCAST RUNNING MULTICAST MTU:2044 Metric:1
RX packets:10843859 errors:0 dropped:0 overruns:0 frame:0
TX packets:8089839 errors:0 dropped:15 overruns:0 carrier:0
collisions:0 txqueuelen:1024 
RX bytes:939249464 (895.7 MiB) TX bytes:886054008 (845.0 MiB)
lo Link encap:Local Loopback 
inet addr:127.0.0.1 Mask:255.0.0.0
UP LOOPBACK RUNNING MTU:16436 Metric:1
RX packets:31235107 errors:0 dropped:0 overruns:0 frame:0
TX packets:31235107 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:0 
RX bytes:132750916041 (123.6 GiB) TX bytes:132750916041 (123.6 GiB)



Tue, 26 Aug 2014 09:48:35 -0700 от Ralph Castain <r...@open-mpi.org>:
>I think something may be messed up with your installation. I went ahead and 
>tested this on a Slurm 2.5.4 cluster, and got the following:
>
>$ time mpirun -np 1 --host bend001 ./hello
>Hello, World, I am 0 of 1 [0 local peers]: get_cpubind: 0 bitmap 0,12
>
>real 0m0.086s
>user 0m0.039s
>sys 0m0.046s
>
>$ time mpirun -np 1 --host bend002 ./hello
>Hello, World, I am 0 of 1 [0 local peers]: get_cpubind: 0 bitmap 0,12
>
>real 0m0.528s
>user 0m0.021s
>sys 0m0.023s
>
>Which is what I would have expected. With --host set to the local host, no 
>daemons are being launched and so the time is quite short (just spent mapping 
>and fork/exec). With --host set to a single remote host, you have the time it 
>takes Slurm to launch our daemon on the remote host, so you get about half of 
>a second.
>
>IIRC, you were having some problems with the OOB setup. If you specify the TCP 
>interface to use, does your time come down?
>
>
>On Aug 26, 2014, at 8:32 AM, Timur Ismagilov < tismagi...@mail.ru > wrote:
>>I'm using slurm 2.5.6
>>
>>$salloc -N8 --exclusive -J ompi -p test
>>$ srun hostname
>>node1-128-21
>>node1-128-24
>>node1-128-22
>>node1-128-26
>>node1-128-27
>>node1-128-20
>>node1-128-25
>>node1-128-23
>>$ time mpirun -np 1 --host node1-128-21 ./hello_c
>>Hello, world, I am 0 of 1, (Open MPI v1.9a1, package: Open MPI 
>>semenov@compiler-2 Distribution, ident: 1.9a1r32570, repo rev: r32570, Aug 
>>21, 2014 (nightly snapshot tarball), 146)
>>real 1m3.932s
>>user 0m0.035s
>>sys 0m0.072s
>>
>>
>>Tue, 26 Aug 2014 07:03:58 -0700 от Ralph Castain < r...@open-mpi.org >:
>>>hmmm....what is your allocation like? do you have a large hostfile, for 
>>>example?
>>>
>>>if you add a --host argument that contains just the local host, what is the 
>>>time for that scenario?
>>>
>>>On Aug 26, 2014, at 6:27 AM, Timur Ismagilov < tismagi...@mail.ru > wrote:
>>>>Hello!
>>>>Here is my time results:
>>>>$time mpirun -n 1 ./hello_c
>>>>Hello, world, I am 0 of 1, (Open MPI v1.9a1, package: Open MPI 
>>>>semenov@compiler-2 Distribution, ident: 1.9a1r32570, repo rev: r32570, Aug 
>>>>21, 2014 (nightly snapshot tarball), 146)
>>>>real 1m3.985s
>>>>user 0m0.031s
>>>>sys 0m0.083s
>>>>
>>>>
>>>>Fri, 22 Aug 2014 07:43:03 -0700 от Ralph Castain < r...@open-mpi.org >:
>>>>>I'm also puzzled by your timing statement - I can't replicate it:
>>>>>
>>>>>07:41:43    $ time mpirun -n 1 ./hello_c
>>>>>Hello, world, I am 0 of 1, (Open MPI v1.9a1, package: Open MPI rhc@bend001 
>>>>>Distribution, ident: 1.9a1r32577, repo rev: r32577, Unreleased developer 
>>>>>copy, 125)
>>>>>
>>>>>real 0m0.547s
>>>>>user 0m0.043s
>>>>>sys 0m0.046s
>>>>>
>>>>>The entire thing ran in 0.5 seconds
>>>>>
>>>>>
>>>>>On Aug 22, 2014, at 6:33 AM, Mike Dubman < mi...@dev.mellanox.co.il > 
>>>>>wrote:
>>>>>>Hi,
>>>>>>The default delimiter is ";" . You can change delimiter with 
>>>>>>mca_base_env_list_delimiter.
>>>>>>
>>>>>>
>>>>>>
>>>>>>On Fri, Aug 22, 2014 at 2:59 PM, Timur Ismagilov   < tismagi...@mail.ru > 
>>>>>>  wrote:
>>>>>>>Hello!
>>>>>>>If i use latest night snapshot:
>>>>>>>$ ompi_info -V
>>>>>>>Open MPI v1.9a1r32570
>>>>>>>*  In programm hello_c initialization takes ~1 min
>>>>>>>In ompi 1.8.2rc4 and ealier it takes ~1 sec(or less)
>>>>>>>*  if i use 
>>>>>>>$mpirun  --mca mca_base_env_list 
>>>>>>>'MXM_SHM_KCOPY_MODE=off,OMP_NUM_THREADS=8' --map-by slot:pe=8 -np 1 
>>>>>>>./hello_c
>>>>>>>i got error 
>>>>>>>config_parser.c:657  MXM  ERROR Invalid value for SHM_KCOPY_MODE: 
>>>>>>>'off,OMP_NUM_THREADS=8'. Expected: [off|knem|cma|autodetect]
>>>>>>>but with -x all works fine (but with warn)
>>>>>>>$mpirun  -x MXM_SHM_KCOPY_MODE=off -x OMP_NUM_THREADS=8 -np 1 ./hello_c
>>>>>>>WARNING: The mechanism by which environment variables are explicitly
>>>>>>>..............
>>>>>>>..............
>>>>>>>..............
>>>>>>>Hello, world, I am 0 of 1, (Open MPI v1.9a1, package: Open MPI 
>>>>>>>semenov@compiler-2 Distribution, ident: 1.9a1r32570, repo rev: r32570, 
>>>>>>>Aug 21, 2014 (nightly snapshot tarball), 146)
>>>>>>>Thu, 21 Aug 2014 06:26:13 -0700 от Ralph Castain < r...@open-mpi.org >:
>>>>>>>>Not sure I understand. The problem has been fixed in both the trunk and 
>>>>>>>>the 1.8 branch now, so you should be able to work with either of those 
>>>>>>>>nightly builds.
>>>>>>>>
>>>>>>>>On Aug 21, 2014, at 12:02 AM, Timur Ismagilov < tismagi...@mail.ru > 
>>>>>>>>wrote:
>>>>>>>>>Have i I any opportunity to run mpi jobs?
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>Wed, 20 Aug 2014 10:48:38 -0700 от Ralph Castain < r...@open-mpi.org >:
>>>>>>>>>>yes, i know - it is cmr'd
>>>>>>>>>>
>>>>>>>>>>On Aug 20, 2014, at 10:26 AM, Mike Dubman < mi...@dev.mellanox.co.il 
>>>>>>>>>>> wrote:
>>>>>>>>>>>btw, we get same error in v1.8 branch as well.
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>On Wed, Aug 20, 2014 at 8:06 PM, Ralph Castain   < r...@open-mpi.org 
>>>>>>>>>>>>   wrote:
>>>>>>>>>>>>It was not yet fixed - but should be now.
>>>>>>>>>>>>
>>>>>>>>>>>>On Aug 20, 2014, at 6:39 AM, Timur Ismagilov < tismagi...@mail.ru > 
>>>>>>>>>>>>wrote:
>>>>>>>>>>>>>Hello!
>>>>>>>>>>>>>
>>>>>>>>>>>>>As i can see, the bug is fixed, but in Open MPI v1.9a1r32516  i 
>>>>>>>>>>>>>still have the problem
>>>>>>>>>>>>>
>>>>>>>>>>>>>a)
>>>>>>>>>>>>>$ mpirun  -np 1 ./hello_c
>>>>>>>>>>>>>--------------------------------------------------------------------------
>>>>>>>>>>>>>An ORTE daemon has unexpectedly failed after launch and before
>>>>>>>>>>>>>communicating back to mpirun. This could be caused by a number
>>>>>>>>>>>>>of factors, including an inability to create a connection back
>>>>>>>>>>>>>to mpirun due to a lack of common network interfaces and/or no
>>>>>>>>>>>>>route found between them. Please check network connectivity
>>>>>>>>>>>>>(including firewalls and network routing requirements).
>>>>>>>>>>>>>--------------------------------------------------------------------------
>>>>>>>>>>>>>b)
>>>>>>>>>>>>>$ mpirun --mca oob_tcp_if_include ib0 -np 1 ./hello_c
>>>>>>>>>>>>>--------------------------------------------------------------------------
>>>>>>>>>>>>>An ORTE daemon has unexpectedly failed after launch and before
>>>>>>>>>>>>>communicating back to mpirun. This could be caused by a number
>>>>>>>>>>>>>of factors, including an inability to create a connection back
>>>>>>>>>>>>>to mpirun due to a lack of common network interfaces and/or no
>>>>>>>>>>>>>route found between them. Please check network connectivity
>>>>>>>>>>>>>(including firewalls and network routing requirements).
>>>>>>>>>>>>>--------------------------------------------------------------------------
>>>>>>>>>>>>>
>>>>>>>>>>>>>c)
>>>>>>>>>>>>>
>>>>>>>>>>>>>$ mpirun --mca oob_tcp_if_include ib0 -debug-daemons --mca 
>>>>>>>>>>>>>plm_base_verbose 5 -mca oob_base_verbose 10 -mca rml_base_verbose 
>>>>>>>>>>>>>10 -np 1 ./hello_c
>>>>>>>>>>>>>[compiler-2:14673] mca:base:select:( plm) Querying component 
>>>>>>>>>>>>>[isolated]
>>>>>>>>>>>>>[compiler-2:14673] mca:base:select:( plm) Query of component 
>>>>>>>>>>>>>[isolated] set priority to 0
>>>>>>>>>>>>>[compiler-2:14673] mca:base:select:( plm) Querying component [rsh]
>>>>>>>>>>>>>[compiler-2:14673] mca:base:select:( plm) Query of component [rsh] 
>>>>>>>>>>>>>set priority to 10
>>>>>>>>>>>>>[compiler-2:14673] mca:base:select:( plm) Querying component 
>>>>>>>>>>>>>[slurm]
>>>>>>>>>>>>>[compiler-2:14673] mca:base:select:( plm) Query of component 
>>>>>>>>>>>>>[slurm] set priority to 75
>>>>>>>>>>>>>[compiler-2:14673] mca:base:select:( plm) Selected component 
>>>>>>>>>>>>>[slurm]
>>>>>>>>>>>>>[compiler-2:14673] mca: base: components_register: registering oob 
>>>>>>>>>>>>>components
>>>>>>>>>>>>>[compiler-2:14673] mca: base: components_register: found loaded 
>>>>>>>>>>>>>component tcp
>>>>>>>>>>>>>[compiler-2:14673] mca: base: components_register: component tcp 
>>>>>>>>>>>>>register function successful
>>>>>>>>>>>>>[compiler-2:14673] mca: base: components_open: opening oob 
>>>>>>>>>>>>>components
>>>>>>>>>>>>>[compiler-2:14673] mca: base: components_open: found loaded 
>>>>>>>>>>>>>component tcp
>>>>>>>>>>>>>[compiler-2:14673] mca: base: components_open: component tcp open 
>>>>>>>>>>>>>function successful
>>>>>>>>>>>>>[compiler-2:14673] mca:oob:select: checking available component tcp
>>>>>>>>>>>>>[compiler-2:14673] mca:oob:select: Querying component [tcp]
>>>>>>>>>>>>>[compiler-2:14673] oob:tcp: component_available called
>>>>>>>>>>>>>[compiler-2:14673] WORKING INTERFACE 1 KERNEL INDEX 1 FAMILY: V4
>>>>>>>>>>>>>[compiler-2:14673] WORKING INTERFACE 2 KERNEL INDEX 3 FAMILY: V4
>>>>>>>>>>>>>[compiler-2:14673] WORKING INTERFACE 3 KERNEL INDEX 4 FAMILY: V4
>>>>>>>>>>>>>[compiler-2:14673] WORKING INTERFACE 4 KERNEL INDEX 5 FAMILY: V4
>>>>>>>>>>>>>[compiler-2:14673] WORKING INTERFACE 5 KERNEL INDEX 6 FAMILY: V4
>>>>>>>>>>>>>[compiler-2:14673] [[49095,0],0] oob:tcp:init adding 10.128.0.4 to 
>>>>>>>>>>>>>our list of V4 connections
>>>>>>>>>>>>>[compiler-2:14673] WORKING INTERFACE 6 KERNEL INDEX 7 FAMILY: V4
>>>>>>>>>>>>>[compiler-2:14673] [[49095,0],0] TCP STARTUP
>>>>>>>>>>>>>[compiler-2:14673] [[49095,0],0] attempting to bind to IPv4 port 0
>>>>>>>>>>>>>[compiler-2:14673] [[49095,0],0] assigned IPv4 port 59460
>>>>>>>>>>>>>[compiler-2:14673] mca:oob:select: Adding component to end
>>>>>>>>>>>>>[compiler-2:14673] mca:oob:select: Found 1 active transports
>>>>>>>>>>>>>[compiler-2:14673] mca: base: components_register: registering rml 
>>>>>>>>>>>>>components
>>>>>>>>>>>>>[compiler-2:14673] mca: base: components_register: found loaded 
>>>>>>>>>>>>>component oob
>>>>>>>>>>>>>[compiler-2:14673] mca: base: components_register: component oob 
>>>>>>>>>>>>>has no register or open function
>>>>>>>>>>>>>[compiler-2:14673] mca: base: components_open: opening rml 
>>>>>>>>>>>>>components
>>>>>>>>>>>>>[compiler-2:14673] mca: base: components_open: found loaded 
>>>>>>>>>>>>>component oob
>>>>>>>>>>>>>[compiler-2:14673] mca: base: components_open: component oob open 
>>>>>>>>>>>>>function successful
>>>>>>>>>>>>>[compiler-2:14673] orte_rml_base_select: initializing rml 
>>>>>>>>>>>>>component oob
>>>>>>>>>>>>>[compiler-2:14673] [[49095,0],0] posting recv
>>>>>>>>>>>>>[compiler-2:14673] [[49095,0],0] posting persistent recv on tag 30 
>>>>>>>>>>>>>for peer [[WILDCARD],WILDCARD]
>>>>>>>>>>>>>[compiler-2:14673] [[49095,0],0] posting recv
>>>>>>>>>>>>>[compiler-2:14673] [[49095,0],0] posting persistent recv on tag 15 
>>>>>>>>>>>>>for peer [[WILDCARD],WILDCARD]
>>>>>>>>>>>>>[compiler-2:14673] [[49095,0],0] posting recv
>>>>>>>>>>>>>[compiler-2:14673] [[49095,0],0] posting persistent recv on tag 32 
>>>>>>>>>>>>>for peer [[WILDCARD],WILDCARD]
>>>>>>>>>>>>>[compiler-2:14673] [[49095,0],0] posting recv
>>>>>>>>>>>>>[compiler-2:14673] [[49095,0],0] posting persistent recv on tag 33 
>>>>>>>>>>>>>for peer [[WILDCARD],WILDCARD]
>>>>>>>>>>>>>[compiler-2:14673] [[49095,0],0] posting recv
>>>>>>>>>>>>>[compiler-2:14673] [[49095,0],0] posting persistent recv on tag 5 
>>>>>>>>>>>>>for peer [[WILDCARD],WILDCARD]
>>>>>>>>>>>>>[compiler-2:14673] [[49095,0],0] posting recv
>>>>>>>>>>>>>[compiler-2:14673] [[49095,0],0] posting persistent recv on tag 10 
>>>>>>>>>>>>>for peer [[WILDCARD],WILDCARD]
>>>>>>>>>>>>>[compiler-2:14673] [[49095,0],0] posting recv
>>>>>>>>>>>>>[compiler-2:14673] [[49095,0],0] posting persistent recv on tag 12 
>>>>>>>>>>>>>for peer [[WILDCARD],WILDCARD]
>>>>>>>>>>>>>[compiler-2:14673] [[49095,0],0] posting recv
>>>>>>>>>>>>>[compiler-2:14673] [[49095,0],0] posting persistent recv on tag 9 
>>>>>>>>>>>>>for peer [[WILDCARD],WILDCARD]
>>>>>>>>>>>>>[compiler-2:14673] [[49095,0],0] posting recv
>>>>>>>>>>>>>[compiler-2:14673] [[49095,0],0] posting persistent recv on tag 34 
>>>>>>>>>>>>>for peer [[WILDCARD],WILDCARD]
>>>>>>>>>>>>>[compiler-2:14673] [[49095,0],0] posting recv
>>>>>>>>>>>>>[compiler-2:14673] [[49095,0],0] posting persistent recv on tag 2 
>>>>>>>>>>>>>for peer [[WILDCARD],WILDCARD]
>>>>>>>>>>>>>[compiler-2:14673] [[49095,0],0] posting recv
>>>>>>>>>>>>>[compiler-2:14673] [[49095,0],0] posting persistent recv on tag 21 
>>>>>>>>>>>>>for peer [[WILDCARD],WILDCARD]
>>>>>>>>>>>>>[compiler-2:14673] [[49095,0],0] posting recv
>>>>>>>>>>>>>[compiler-2:14673] [[49095,0],0] posting persistent recv on tag 22 
>>>>>>>>>>>>>for peer [[WILDCARD],WILDCARD]
>>>>>>>>>>>>>[compiler-2:14673] [[49095,0],0] posting recv
>>>>>>>>>>>>>[compiler-2:14673] [[49095,0],0] posting persistent recv on tag 45 
>>>>>>>>>>>>>for peer [[WILDCARD],WILDCARD]
>>>>>>>>>>>>>[compiler-2:14673] [[49095,0],0] posting recv
>>>>>>>>>>>>>[compiler-2:14673] [[49095,0],0] posting persistent recv on tag 46 
>>>>>>>>>>>>>for peer [[WILDCARD],WILDCARD]
>>>>>>>>>>>>>[compiler-2:14673] [[49095,0],0] posting recv
>>>>>>>>>>>>>[compiler-2:14673] [[49095,0],0] posting persistent recv on tag 1 
>>>>>>>>>>>>>for peer [[WILDCARD],WILDCARD]
>>>>>>>>>>>>>[compiler-2:14673] [[49095,0],0] posting recv
>>>>>>>>>>>>>[compiler-2:14673] [[49095,0],0] posting persistent recv on tag 27 
>>>>>>>>>>>>>for peer [[WILDCARD],WILDCARD]
>>>>>>>>>>>>>Daemon was launched on node1-128-01 - beginning to initialize
>>>>>>>>>>>>>--------------------------------------------------------------------------
>>>>>>>>>>>>>WARNING: An invalid value was given for oob_tcp_if_include. This
>>>>>>>>>>>>>value will be ignored.
>>>>>>>>>>>>>Local host: node1-128-01
>>>>>>>>>>>>>Value: "ib0"
>>>>>>>>>>>>>Message: Invalid specification (missing "/")
>>>>>>>>>>>>>--------------------------------------------------------------------------
>>>>>>>>>>>>>--------------------------------------------------------------------------
>>>>>>>>>>>>>None of the TCP networks specified to be included for out-of-band 
>>>>>>>>>>>>>communications
>>>>>>>>>>>>>could be found:
>>>>>>>>>>>>>Value given:
>>>>>>>>>>>>>Please revise the specification and try again.
>>>>>>>>>>>>>--------------------------------------------------------------------------
>>>>>>>>>>>>>--------------------------------------------------------------------------
>>>>>>>>>>>>>No network interfaces were found for out-of-band communications. 
>>>>>>>>>>>>>We require
>>>>>>>>>>>>>at least one available network for out-of-band messaging.
>>>>>>>>>>>>>--------------------------------------------------------------------------
>>>>>>>>>>>>>--------------------------------------------------------------------------
>>>>>>>>>>>>>It looks like orte_init failed for some reason; your parallel 
>>>>>>>>>>>>>process is
>>>>>>>>>>>>>likely to abort. There are many reasons that a parallel process can
>>>>>>>>>>>>>fail during orte_init; some of which are due to configuration or
>>>>>>>>>>>>>environment problems. This failure appears to be an internal 
>>>>>>>>>>>>>failure;
>>>>>>>>>>>>>here's some additional information (which may only be relevant to 
>>>>>>>>>>>>>an
>>>>>>>>>>>>>Open MPI developer):
>>>>>>>>>>>>>orte_oob_base_select failed
>>>>>>>>>>>>>--> Returned value (null) (-43) instead of ORTE_SUCCESS
>>>>>>>>>>>>>--------------------------------------------------------------------------
>>>>>>>>>>>>>srun: error: node1-128-01: task 0: Exited with exit code 213
>>>>>>>>>>>>>srun: Terminating job step 661215.0
>>>>>>>>>>>>>--------------------------------------------------------------------------
>>>>>>>>>>>>>An ORTE daemon has unexpectedly failed after launch and before
>>>>>>>>>>>>>communicating back to mpirun. This could be caused by a number
>>>>>>>>>>>>>of factors, including an inability to create a connection back
>>>>>>>>>>>>>to mpirun due to a lack of common network interfaces and/or no
>>>>>>>>>>>>>route found between them. Please check network connectivity
>>>>>>>>>>>>>(including firewalls and network routing requirements).
>>>>>>>>>>>>>--------------------------------------------------------------------------
>>>>>>>>>>>>>[compiler-2:14673] [[49095,0],0] orted_cmd: received halt_vm cmd
>>>>>>>>>>>>>[compiler-2:14673] mca: base: close: component oob closed
>>>>>>>>>>>>>[compiler-2:14673] mca: base: close: unloading component oob
>>>>>>>>>>>>>[compiler-2:14673] [[49095,0],0] TCP SHUTDOWN
>>>>>>>>>>>>>[compiler-2:14673] mca: base: close: component tcp closed
>>>>>>>>>>>>>[compiler-2:14673] mca: base: close: unloading component tcp
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>Tue, 12 Aug 2014 18:33:24 +0000 от "Jeff Squyres (jsquyres)" < 
>>>>>>>>>>>>>jsquy...@cisco.com >:
>>>>>>>>>>>>>>I filed the following ticket:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>     https://svn.open-mpi.org/trac/ompi/ticket/4857
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>On Aug 12, 2014, at 12:39 PM, Jeff Squyres (jsquyres) < 
>>>>>>>>>>>>>>jsquy...@cisco.com > wrote:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> (please keep the users list CC'ed)
>>>>>>>>>>>>>>>  
>>>>>>>>>>>>>>> We talked about this on the weekly engineering call today. 
>>>>>>>>>>>>>>> Ralph has an idea what is happening -- I need to do a little 
>>>>>>>>>>>>>>> investigation today and file a bug. I'll make sure you're CC'ed 
>>>>>>>>>>>>>>> on the bug ticket.
>>>>>>>>>>>>>>>  
>>>>>>>>>>>>>>>  
>>>>>>>>>>>>>>>  
>>>>>>>>>>>>>>> On Aug 12, 2014, at 12:27 PM, Timur Ismagilov < 
>>>>>>>>>>>>>>> tismagi...@mail.ru > wrote:
>>>>>>>>>>>>>>>  
>>>>>>>>>>>>>>>> I don't have this error in OMPI 1.9a1r32252 and OMPI 1.8.1 
>>>>>>>>>>>>>>>> (with --mca oob_tcp_if_include ib0), but in all latest night 
>>>>>>>>>>>>>>>> snapshots i got this error.
>>>>>>>>>>>>>>>>  
>>>>>>>>>>>>>>>>  
>>>>>>>>>>>>>>>> Tue, 12 Aug 2014 13:08:12 +0000 от "Jeff Squyres (jsquyres)" < 
>>>>>>>>>>>>>>>> jsquy...@cisco.com >:
>>>>>>>>>>>>>>>> Are you running any kind of firewall on the node where mpirun 
>>>>>>>>>>>>>>>> is invoked? Open MPI needs to be able to use arbitrary TCP 
>>>>>>>>>>>>>>>> ports between the servers on which it runs.
>>>>>>>>>>>>>>>>  
>>>>>>>>>>>>>>>> This second mail seems to imply a bug in OMPI's 
>>>>>>>>>>>>>>>> oob_tcp_if_include param handling, however -- it's supposed to 
>>>>>>>>>>>>>>>> be able to handle an interface name (not just a network 
>>>>>>>>>>>>>>>> specification).
>>>>>>>>>>>>>>>>  
>>>>>>>>>>>>>>>> Ralph -- can you have a look?
>>>>>>>>>>>>>>>>  
>>>>>>>>>>>>>>>>  
>>>>>>>>>>>>>>>> On Aug 12, 2014, at 8:41 AM, Timur Ismagilov < 
>>>>>>>>>>>>>>>> tismagi...@mail.ru > wrote:
>>>>>>>>>>>>>>>>  
>>>>>>>>>>>>>>>>> When i add --mca oob_tcp_if_include ib0 (infiniband 
>>>>>>>>>>>>>>>>> interface) to mpirun (as it was here:   
>>>>>>>>>>>>>>>>> http://www.open-mpi.org/community/lists/users/2014/07/24857.php
>>>>>>>>>>>>>>>>>    ) i got this output:
>>>>>>>>>>>>>>>>>  
>>>>>>>>>>>>>>>>> [compiler-2:08792] mca:base:select:( plm) Querying component 
>>>>>>>>>>>>>>>>> [isolated]
>>>>>>>>>>>>>>>>> [compiler-2:08792] mca:base:select:( plm) Query of component 
>>>>>>>>>>>>>>>>> [isolated] set priority to 0
>>>>>>>>>>>>>>>>> [compiler-2:08792] mca:base:select:( plm) Querying component 
>>>>>>>>>>>>>>>>> [rsh]
>>>>>>>>>>>>>>>>> [compiler-2:08792] mca:base:select:( plm) Query of component 
>>>>>>>>>>>>>>>>> [rsh] set priority to 10
>>>>>>>>>>>>>>>>> [compiler-2:08792] mca:base:select:( plm) Querying component 
>>>>>>>>>>>>>>>>> [slurm]
>>>>>>>>>>>>>>>>> [compiler-2:08792] mca:base:select:( plm) Query of component 
>>>>>>>>>>>>>>>>> [slurm] set priority to 75
>>>>>>>>>>>>>>>>> [compiler-2:08792] mca:base:select:( plm) Selected component 
>>>>>>>>>>>>>>>>> [slurm]
>>>>>>>>>>>>>>>>> [compiler-2:08792] mca: base: components_register: 
>>>>>>>>>>>>>>>>> registering oob components
>>>>>>>>>>>>>>>>> [compiler-2:08792] mca: base: components_register: found 
>>>>>>>>>>>>>>>>> loaded component tcp
>>>>>>>>>>>>>>>>> [compiler-2:08792] mca: base: components_register: component 
>>>>>>>>>>>>>>>>> tcp register function successful
>>>>>>>>>>>>>>>>> [compiler-2:08792] mca: base: components_open: opening oob 
>>>>>>>>>>>>>>>>> components
>>>>>>>>>>>>>>>>> [compiler-2:08792] mca: base: components_open: found loaded 
>>>>>>>>>>>>>>>>> component tcp
>>>>>>>>>>>>>>>>> [compiler-2:08792] mca: base: components_open: component tcp 
>>>>>>>>>>>>>>>>> open function successful
>>>>>>>>>>>>>>>>> [compiler-2:08792] mca:oob:select: checking available 
>>>>>>>>>>>>>>>>> component tcp
>>>>>>>>>>>>>>>>> [compiler-2:08792] mca:oob:select: Querying component [tcp]
>>>>>>>>>>>>>>>>> [compiler-2:08792] oob:tcp: component_available called
>>>>>>>>>>>>>>>>> [compiler-2:08792] WORKING INTERFACE 1 KERNEL INDEX 1 FAMILY: 
>>>>>>>>>>>>>>>>> V4
>>>>>>>>>>>>>>>>> [compiler-2:08792] WORKING INTERFACE 2 KERNEL INDEX 3 FAMILY: 
>>>>>>>>>>>>>>>>> V4
>>>>>>>>>>>>>>>>> [compiler-2:08792] WORKING INTERFACE 3 KERNEL INDEX 4 FAMILY: 
>>>>>>>>>>>>>>>>> V4
>>>>>>>>>>>>>>>>> [compiler-2:08792] WORKING INTERFACE 4 KERNEL INDEX 5 FAMILY: 
>>>>>>>>>>>>>>>>> V4
>>>>>>>>>>>>>>>>> [compiler-2:08792] WORKING INTERFACE 5 KERNEL INDEX 6 FAMILY: 
>>>>>>>>>>>>>>>>> V4
>>>>>>>>>>>>>>>>> [compiler-2:08792] [[42190,0],0] oob:tcp:init adding 
>>>>>>>>>>>>>>>>> 10.128.0.4 to our list of V4 connections
>>>>>>>>>>>>>>>>> [compiler-2:08792] WORKING INTERFACE 6 KERNEL INDEX 7 FAMILY: 
>>>>>>>>>>>>>>>>> V4
>>>>>>>>>>>>>>>>> [compiler-2:08792] [[42190,0],0] TCP STARTUP
>>>>>>>>>>>>>>>>> [compiler-2:08792] [[42190,0],0] attempting to bind to IPv4 
>>>>>>>>>>>>>>>>> port 0
>>>>>>>>>>>>>>>>> [compiler-2:08792] [[42190,0],0] assigned IPv4 port 53883
>>>>>>>>>>>>>>>>> [compiler-2:08792] mca:oob:select: Adding component to end
>>>>>>>>>>>>>>>>> [compiler-2:08792] mca:oob:select: Found 1 active transports
>>>>>>>>>>>>>>>>> [compiler-2:08792] mca: base: components_register: 
>>>>>>>>>>>>>>>>> registering rml components
>>>>>>>>>>>>>>>>> [compiler-2:08792] mca: base: components_register: found 
>>>>>>>>>>>>>>>>> loaded component oob
>>>>>>>>>>>>>>>>> [compiler-2:08792] mca: base: components_register: component 
>>>>>>>>>>>>>>>>> oob has no register or open function
>>>>>>>>>>>>>>>>> [compiler-2:08792] mca: base: components_open: opening rml 
>>>>>>>>>>>>>>>>> components
>>>>>>>>>>>>>>>>> [compiler-2:08792] mca: base: components_open: found loaded 
>>>>>>>>>>>>>>>>> component oob
>>>>>>>>>>>>>>>>> [compiler-2:08792] mca: base: components_open: component oob 
>>>>>>>>>>>>>>>>> open function successful
>>>>>>>>>>>>>>>>> [compiler-2:08792] orte_rml_base_select: initializing rml 
>>>>>>>>>>>>>>>>> component oob
>>>>>>>>>>>>>>>>> [compiler-2:08792] [[42190,0],0] posting recv
>>>>>>>>>>>>>>>>> [compiler-2:08792] [[42190,0],0] posting persistent recv on 
>>>>>>>>>>>>>>>>> tag 30 for peer [[WILDCARD],WILDCARD]
>>>>>>>>>>>>>>>>> [compiler-2:08792] [[42190,0],0] posting recv
>>>>>>>>>>>>>>>>> [compiler-2:08792] [[42190,0],0] posting persistent recv on 
>>>>>>>>>>>>>>>>> tag 15 for peer [[WILDCARD],WILDCARD]
>>>>>>>>>>>>>>>>> [compiler-2:08792] [[42190,0],0] posting recv
>>>>>>>>>>>>>>>>> [compiler-2:08792] [[42190,0],0] posting persistent recv on 
>>>>>>>>>>>>>>>>> tag 32 for peer [[WILDCARD],WILDCARD]
>>>>>>>>>>>>>>>>> [compiler-2:08792] [[42190,0],0] posting recv
>>>>>>>>>>>>>>>>> [compiler-2:08792] [[42190,0],0] posting persistent recv on 
>>>>>>>>>>>>>>>>> tag 33 for peer [[WILDCARD],WILDCARD]
>>>>>>>>>>>>>>>>> [compiler-2:08792] [[42190,0],0] posting recv
>>>>>>>>>>>>>>>>> [compiler-2:08792] [[42190,0],0] posting persistent recv on 
>>>>>>>>>>>>>>>>> tag 5 for peer [[WILDCARD],WILDCARD]
>>>>>>>>>>>>>>>>> [compiler-2:08792] [[42190,0],0] posting recv
>>>>>>>>>>>>>>>>> [compiler-2:08792] [[42190,0],0] posting persistent recv on 
>>>>>>>>>>>>>>>>> tag 10 for peer [[WILDCARD],WILDCARD]
>>>>>>>>>>>>>>>>> [compiler-2:08792] [[42190,0],0] posting recv
>>>>>>>>>>>>>>>>> [compiler-2:08792] [[42190,0],0] posting persistent recv on 
>>>>>>>>>>>>>>>>> tag 12 for peer [[WILDCARD],WILDCARD]
>>>>>>>>>>>>>>>>> [compiler-2:08792] [[42190,0],0] posting recv
>>>>>>>>>>>>>>>>> [compiler-2:08792] [[42190,0],0] posting persistent recv on 
>>>>>>>>>>>>>>>>> tag 9 for peer [[WILDCARD],WILDCARD]
>>>>>>>>>>>>>>>>> [compiler-2:08792] [[42190,0],0] posting recv
>>>>>>>>>>>>>>>>> [compiler-2:08792] [[42190,0],0] posting persistent recv on 
>>>>>>>>>>>>>>>>> tag 34 for peer [[WILDCARD],WILDCARD]
>>>>>>>>>>>>>>>>> [compiler-2:08792] [[42190,0],0] posting recv
>>>>>>>>>>>>>>>>> [compiler-2:08792] [[42190,0],0] posting persistent recv on 
>>>>>>>>>>>>>>>>> tag 2 for peer [[WILDCARD],WILDCARD]
>>>>>>>>>>>>>>>>> [compiler-2:08792] [[42190,0],0] posting recv
>>>>>>>>>>>>>>>>> [compiler-2:08792] [[42190,0],0] posting persistent recv on 
>>>>>>>>>>>>>>>>> tag 21 for peer [[WILDCARD],WILDCARD]
>>>>>>>>>>>>>>>>> [compiler-2:08792] [[42190,0],0] posting recv
>>>>>>>>>>>>>>>>> [compiler-2:08792] [[42190,0],0] posting persistent recv on 
>>>>>>>>>>>>>>>>> tag 22 for peer [[WILDCARD],WILDCARD]
>>>>>>>>>>>>>>>>> [compiler-2:08792] [[42190,0],0] posting recv
>>>>>>>>>>>>>>>>> [compiler-2:08792] [[42190,0],0] posting persistent recv on 
>>>>>>>>>>>>>>>>> tag 45 for peer [[WILDCARD],WILDCARD]
>>>>>>>>>>>>>>>>> [compiler-2:08792] [[42190,0],0] posting recv
>>>>>>>>>>>>>>>>> [compiler-2:08792] [[42190,0],0] posting persistent recv on 
>>>>>>>>>>>>>>>>> tag 46 for peer [[WILDCARD],WILDCARD]
>>>>>>>>>>>>>>>>> [compiler-2:08792] [[42190,0],0] posting recv
>>>>>>>>>>>>>>>>> [compiler-2:08792] [[42190,0],0] posting persistent recv on 
>>>>>>>>>>>>>>>>> tag 1 for peer [[WILDCARD],WILDCARD]
>>>>>>>>>>>>>>>>> [compiler-2:08792] [[42190,0],0] posting recv
>>>>>>>>>>>>>>>>> [compiler-2:08792] [[42190,0],0] posting persistent recv on 
>>>>>>>>>>>>>>>>> tag 27 for peer [[WILDCARD],WILDCARD]
>>>>>>>>>>>>>>>>> Daemon was launched on node1-128-01 - beginning to initialize
>>>>>>>>>>>>>>>>> Daemon was launched on node1-128-02 - beginning to initialize
>>>>>>>>>>>>>>>>> --------------------------------------------------------------------------
>>>>>>>>>>>>>>>>> WARNING: An invalid value was given for oob_tcp_if_include. 
>>>>>>>>>>>>>>>>> This
>>>>>>>>>>>>>>>>> value will be ignored.
>>>>>>>>>>>>>>>>>  
>>>>>>>>>>>>>>>>> Local host: node1-128-01
>>>>>>>>>>>>>>>>> Value: "ib0"
>>>>>>>>>>>>>>>>> Message: Invalid specification (missing "/")
>>>>>>>>>>>>>>>>> --------------------------------------------------------------------------
>>>>>>>>>>>>>>>>> --------------------------------------------------------------------------
>>>>>>>>>>>>>>>>> WARNING: An invalid value was given for oob_tcp_if_include. 
>>>>>>>>>>>>>>>>> This
>>>>>>>>>>>>>>>>> value will be ignored.
>>>>>>>>>>>>>>>>>  
>>>>>>>>>>>>>>>>> Local host: node1-128-02
>>>>>>>>>>>>>>>>> Value: "ib0"
>>>>>>>>>>>>>>>>> Message: Invalid specification (missing "/")
>>>>>>>>>>>>>>>>> --------------------------------------------------------------------------
>>>>>>>>>>>>>>>>> --------------------------------------------------------------------------
>>>>>>>>>>>>>>>>> None of the TCP networks specified to be included for 
>>>>>>>>>>>>>>>>> out-of-band communications
>>>>>>>>>>>>>>>>> could be found:
>>>>>>>>>>>>>>>>>  
>>>>>>>>>>>>>>>>> Value given:
>>>>>>>>>>>>>>>>>  
>>>>>>>>>>>>>>>>> Please revise the specification and try again.
>>>>>>>>>>>>>>>>> --------------------------------------------------------------------------
>>>>>>>>>>>>>>>>> --------------------------------------------------------------------------
>>>>>>>>>>>>>>>>> None of the TCP networks specified to be included for 
>>>>>>>>>>>>>>>>> out-of-band communications
>>>>>>>>>>>>>>>>> could be found:
>>>>>>>>>>>>>>>>>  
>>>>>>>>>>>>>>>>> Value given:
>>>>>>>>>>>>>>>>>  
>>>>>>>>>>>>>>>>> Please revise the specification and try again.
>>>>>>>>>>>>>>>>> --------------------------------------------------------------------------
>>>>>>>>>>>>>>>>> --------------------------------------------------------------------------
>>>>>>>>>>>>>>>>> No network interfaces were found for out-of-band 
>>>>>>>>>>>>>>>>> communications. We require
>>>>>>>>>>>>>>>>> at least one available network for out-of-band messaging.
>>>>>>>>>>>>>>>>> --------------------------------------------------------------------------
>>>>>>>>>>>>>>>>> --------------------------------------------------------------------------
>>>>>>>>>>>>>>>>> No network interfaces were found for out-of-band 
>>>>>>>>>>>>>>>>> communications. We require
>>>>>>>>>>>>>>>>> at least one available network for out-of-band messaging.
>>>>>>>>>>>>>>>>> --------------------------------------------------------------------------
>>>>>>>>>>>>>>>>> --------------------------------------------------------------------------
>>>>>>>>>>>>>>>>> It looks like orte_init failed for some reason; your parallel 
>>>>>>>>>>>>>>>>> process is
>>>>>>>>>>>>>>>>> likely to abort. There are many reasons that a parallel 
>>>>>>>>>>>>>>>>> process can
>>>>>>>>>>>>>>>>> fail during orte_init; some of which are due to configuration 
>>>>>>>>>>>>>>>>> or
>>>>>>>>>>>>>>>>> environment problems. This failure appears to be an internal 
>>>>>>>>>>>>>>>>> failure;
>>>>>>>>>>>>>>>>> here's some additional information (which may only be 
>>>>>>>>>>>>>>>>> relevant to an
>>>>>>>>>>>>>>>>> Open MPI developer):
>>>>>>>>>>>>>>>>>  
>>>>>>>>>>>>>>>>> orte_oob_base_select failed
>>>>>>>>>>>>>>>>> --> Returned value (null) (-43) instead of ORTE_SUCCESS
>>>>>>>>>>>>>>>>> --------------------------------------------------------------------------
>>>>>>>>>>>>>>>>> --------------------------------------------------------------------------
>>>>>>>>>>>>>>>>> It looks like orte_init failed for some reason; your parallel 
>>>>>>>>>>>>>>>>> process is
>>>>>>>>>>>>>>>>> likely to abort. There are many reasons that a parallel 
>>>>>>>>>>>>>>>>> process can
>>>>>>>>>>>>>>>>> fail during orte_init; some of which are due to configuration 
>>>>>>>>>>>>>>>>> or
>>>>>>>>>>>>>>>>> environment problems. This failure appears to be an internal 
>>>>>>>>>>>>>>>>> failure;
>>>>>>>>>>>>>>>>> here's some additional information (which may only be 
>>>>>>>>>>>>>>>>> relevant to an
>>>>>>>>>>>>>>>>> Open MPI developer):
>>>>>>>>>>>>>>>>>  
>>>>>>>>>>>>>>>>> orte_oob_base_select failed
>>>>>>>>>>>>>>>>> --> Returned value (null) (-43) instead of ORTE_SUCCESS
>>>>>>>>>>>>>>>>> --------------------------------------------------------------------------
>>>>>>>>>>>>>>>>> srun: error: node1-128-02: task 1: Exited with exit code 213
>>>>>>>>>>>>>>>>> srun: Terminating job step 657300.0
>>>>>>>>>>>>>>>>> srun: error: node1-128-01: task 0: Exited with exit code 213
>>>>>>>>>>>>>>>>> --------------------------------------------------------------------------
>>>>>>>>>>>>>>>>> An ORTE daemon has unexpectedly failed after launch and before
>>>>>>>>>>>>>>>>> communicating back to mpirun. This could be caused by a number
>>>>>>>>>>>>>>>>> of factors, including an inability to create a connection back
>>>>>>>>>>>>>>>>> to mpirun due to a lack of common network interfaces and/or no
>>>>>>>>>>>>>>>>> route found between them. Please check network connectivity
>>>>>>>>>>>>>>>>> (including firewalls and network routing requirements).
>>>>>>>>>>>>>>>>> --------------------------------------------------------------------------
>>>>>>>>>>>>>>>>> [compiler-2:08792] [[42190,0],0] orted_cmd: received halt_vm 
>>>>>>>>>>>>>>>>> cmd
>>>>>>>>>>>>>>>>> [compiler-2:08792] mca: base: close: component oob closed
>>>>>>>>>>>>>>>>> [compiler-2:08792] mca: base: close: unloading component oob
>>>>>>>>>>>>>>>>> [compiler-2:08792] [[42190,0],0] TCP SHUTDOWN
>>>>>>>>>>>>>>>>> [compiler-2:08792] mca: base: close: component tcp closed
>>>>>>>>>>>>>>>>> [compiler-2:08792] mca: base: close: unloading component tcp
>>>>>>>>>>>>>>>>>  
>>>>>>>>>>>>>>>>>  
>>>>>>>>>>>>>>>>>  
>>>>>>>>>>>>>>>>> Tue, 12 Aug 2014 16:14:58 +0400 от Timur Ismagilov < 
>>>>>>>>>>>>>>>>> tismagi...@mail.ru >:
>>>>>>>>>>>>>>>>> Hello!
>>>>>>>>>>>>>>>>>  
>>>>>>>>>>>>>>>>> I have Open MPI v1.8.2rc4r32485
>>>>>>>>>>>>>>>>>  
>>>>>>>>>>>>>>>>> When i run hello_c, I got this error message
>>>>>>>>>>>>>>>>> $mpirun -np 2 hello_c
>>>>>>>>>>>>>>>>>  
>>>>>>>>>>>>>>>>> An ORTE daemon has unexpectedly failed after launch and before
>>>>>>>>>>>>>>>>>  
>>>>>>>>>>>>>>>>> communicating back to mpirun. This could be caused by a number
>>>>>>>>>>>>>>>>> of factors, including an inability to create a connection back
>>>>>>>>>>>>>>>>> to mpirun due to a lack of common network interfaces and/or no
>>>>>>>>>>>>>>>>> route found between them. Please check network connectivity
>>>>>>>>>>>>>>>>> (including firewalls and network routing requirements).
>>>>>>>>>>>>>>>>>  
>>>>>>>>>>>>>>>>> When i run with --debug-daemons --mca plm_base_verbose 5 -mca 
>>>>>>>>>>>>>>>>> oob_base_verbose 10 -mca rml_base_verbose 10 i got this 
>>>>>>>>>>>>>>>>> output:
>>>>>>>>>>>>>>>>> $mpirun --debug-daemons --mca plm_base_verbose 5 -mca 
>>>>>>>>>>>>>>>>> oob_base_verbose 10 -mca rml_base_verbose 10 -np 2 hello_c
>>>>>>>>>>>>>>>>>  
>>>>>>>>>>>>>>>>> [compiler-2:08780] mca:base:select:( plm) Querying component 
>>>>>>>>>>>>>>>>> [isolated]
>>>>>>>>>>>>>>>>> [compiler-2:08780] mca:base:select:( plm) Query of component 
>>>>>>>>>>>>>>>>> [isolated] set priority to 0
>>>>>>>>>>>>>>>>> [compiler-2:08780] mca:base:select:( plm) Querying component 
>>>>>>>>>>>>>>>>> [rsh]
>>>>>>>>>>>>>>>>> [compiler-2:08780] mca:base:select:( plm) Query of component 
>>>>>>>>>>>>>>>>> [rsh] set priority to 10
>>>>>>>>>>>>>>>>> [compiler-2:08780] mca:base:select:( plm) Querying component 
>>>>>>>>>>>>>>>>> [slurm]
>>>>>>>>>>>>>>>>> [compiler-2:08780] mca:base:select:( plm) Query of component 
>>>>>>>>>>>>>>>>> [slurm] set priority to 75
>>>>>>>>>>>>>>>>> [compiler-2:08780] mca:base:select:( plm) Selected component 
>>>>>>>>>>>>>>>>> [slurm]
>>>>>>>>>>>>>>>>> [compiler-2:08780] mca: base: components_register: 
>>>>>>>>>>>>>>>>> registering oob components
>>>>>>>>>>>>>>>>> [compiler-2:08780] mca: base: components_register: found 
>>>>>>>>>>>>>>>>> loaded component tcp
>>>>>>>>>>>>>>>>> [compiler-2:08780] mca: base: components_register: component 
>>>>>>>>>>>>>>>>> tcp register function successful
>>>>>>>>>>>>>>>>> [compiler-2:08780] mca: base: components_open: opening oob 
>>>>>>>>>>>>>>>>> components
>>>>>>>>>>>>>>>>> [compiler-2:08780] mca: base: components_open: found loaded 
>>>>>>>>>>>>>>>>> component tcp
>>>>>>>>>>>>>>>>> [compiler-2:08780] mca: base: components_open: component tcp 
>>>>>>>>>>>>>>>>> open function successful
>>>>>>>>>>>>>>>>> [compiler-2:08780] mca:oob:select: checking available 
>>>>>>>>>>>>>>>>> component tcp
>>>>>>>>>>>>>>>>> [compiler-2:08780] mca:oob:select: Querying component [tcp]
>>>>>>>>>>>>>>>>> [compiler-2:08780] oob:tcp: component_available called
>>>>>>>>>>>>>>>>> [compiler-2:08780] WORKING INTERFACE 1 KERNEL INDEX 1 FAMILY: 
>>>>>>>>>>>>>>>>> V4
>>>>>>>>>>>>>>>>> [compiler-2:08780] WORKING INTERFACE 2 KERNEL INDEX 3 FAMILY: 
>>>>>>>>>>>>>>>>> V4
>>>>>>>>>>>>>>>>> [compiler-2:08780] [[42202,0],0] oob:tcp:init adding 
>>>>>>>>>>>>>>>>> 10.0.251.53 to our list of V4 connections
>>>>>>>>>>>>>>>>> [compiler-2:08780] WORKING INTERFACE 3 KERNEL INDEX 4 FAMILY: 
>>>>>>>>>>>>>>>>> V4
>>>>>>>>>>>>>>>>> [compiler-2:08780] [[42202,0],0] oob:tcp:init adding 10.0.0.4 
>>>>>>>>>>>>>>>>> to our list of V4 connections
>>>>>>>>>>>>>>>>> [compiler-2:08780] WORKING INTERFACE 4 KERNEL INDEX 5 FAMILY: 
>>>>>>>>>>>>>>>>> V4
>>>>>>>>>>>>>>>>> [compiler-2:08780] [[42202,0],0] oob:tcp:init adding 
>>>>>>>>>>>>>>>>> 10.2.251.14 to our list of V4 connections
>>>>>>>>>>>>>>>>> [compiler-2:08780] WORKING INTERFACE 5 KERNEL INDEX 6 FAMILY: 
>>>>>>>>>>>>>>>>> V4
>>>>>>>>>>>>>>>>> [compiler-2:08780] [[42202,0],0] oob:tcp:init adding 
>>>>>>>>>>>>>>>>> 10.128.0.4 to our list of V4 connections
>>>>>>>>>>>>>>>>> [compiler-2:08780] WORKING INTERFACE 6 KERNEL INDEX 7 FAMILY: 
>>>>>>>>>>>>>>>>> V4
>>>>>>>>>>>>>>>>> [compiler-2:08780] [[42202,0],0] oob:tcp:init adding 
>>>>>>>>>>>>>>>>> 93.180.7.38 to our list of V4 connections
>>>>>>>>>>>>>>>>> [compiler-2:08780] [[42202,0],0] TCP STARTUP
>>>>>>>>>>>>>>>>> [compiler-2:08780] [[42202,0],0] attempting to bind to IPv4 
>>>>>>>>>>>>>>>>> port 0
>>>>>>>>>>>>>>>>> [compiler-2:08780] [[42202,0],0] assigned IPv4 port 38420
>>>>>>>>>>>>>>>>> [compiler-2:08780] mca:oob:select: Adding component to end
>>>>>>>>>>>>>>>>> [compiler-2:08780] mca:oob:select: Found 1 active transports
>>>>>>>>>>>>>>>>> [compiler-2:08780] mca: base: components_register: 
>>>>>>>>>>>>>>>>> registering rml components
>>>>>>>>>>>>>>>>> [compiler-2:08780] mca: base: components_register: found 
>>>>>>>>>>>>>>>>> loaded component oob
>>>>>>>>>>>>>>>>> [compiler-2:08780] mca: base: components_register: component 
>>>>>>>>>>>>>>>>> oob has no register or open function
>>>>>>>>>>>>>>>>> [compiler-2:08780] mca: base: components_open: opening rml 
>>>>>>>>>>>>>>>>> components
>>>>>>>>>>>>>>>>> [compiler-2:08780] mca: base: components_open: found loaded 
>>>>>>>>>>>>>>>>> component oob
>>>>>>>>>>>>>>>>> [compiler-2:08780] mca: base: components_open: component oob 
>>>>>>>>>>>>>>>>> open function successful
>>>>>>>>>>>>>>>>> [compiler-2:08780] orte_rml_base_select: initializing rml 
>>>>>>>>>>>>>>>>> component oob
>>>>>>>>>>>>>>>>> [compiler-2:08780] [[42202,0],0] posting recv
>>>>>>>>>>>>>>>>> [compiler-2:08780] [[42202,0],0] posting persistent recv on 
>>>>>>>>>>>>>>>>> tag 30 for peer [[WILDCARD],WILDCARD]
>>>>>>>>>>>>>>>>> [compiler-2:08780] [[42202,0],0] posting recv
>>>>>>>>>>>>>>>>> [compiler-2:08780] [[42202,0],0] posting persistent recv on 
>>>>>>>>>>>>>>>>> tag 15 for peer [[WILDCARD],WILDCARD]
>>>>>>>>>>>>>>>>> [compiler-2:08780] [[42202,0],0] posting recv
>>>>>>>>>>>>>>>>> [compiler-2:08780] [[42202,0],0] posting persistent recv on 
>>>>>>>>>>>>>>>>> tag 32 for peer [[WILDCARD],WILDCARD]
>>>>>>>>>>>>>>>>> [compiler-2:08780] [[42202,0],0] posting recv
>>>>>>>>>>>>>>>>> [compiler-2:08780] [[42202,0],0] posting persistent recv on 
>>>>>>>>>>>>>>>>> tag 33 for peer [[WILDCARD],WILDCARD]
>>>>>>>>>>>>>>>>> [compiler-2:08780] [[42202,0],0] posting recv
>>>>>>>>>>>>>>>>> [compiler-2:08780] [[42202,0],0] posting persistent recv on 
>>>>>>>>>>>>>>>>> tag 5 for peer [[WILDCARD],WILDCARD]
>>>>>>>>>>>>>>>>> [compiler-2:08780] [[42202,0],0] posting recv
>>>>>>>>>>>>>>>>> [compiler-2:08780] [[42202,0],0] posting persistent recv on 
>>>>>>>>>>>>>>>>> tag 10 for peer [[WILDCARD],WILDCARD]
>>>>>>>>>>>>>>>>> [compiler-2:08780] [[42202,0],0] posting recv
>>>>>>>>>>>>>>>>> [compiler-2:08780] [[42202,0],0] posting persistent recv on 
>>>>>>>>>>>>>>>>> tag 12 for peer [[WILDCARD],WILDCARD]
>>>>>>>>>>>>>>>>> [compiler-2:08780] [[42202,0],0] posting recv
>>>>>>>>>>>>>>>>> [compiler-2:08780] [[42202,0],0] posting persistent recv on 
>>>>>>>>>>>>>>>>> tag 9 for peer [[WILDCARD],WILDCARD]
>>>>>>>>>>>>>>>>> [compiler-2:08780] [[42202,0],0] posting recv
>>>>>>>>>>>>>>>>> [compiler-2:08780] [[42202,0],0] posting persistent recv on 
>>>>>>>>>>>>>>>>> tag 34 for peer [[WILDCARD],WILDCARD]
>>>>>>>>>>>>>>>>> [compiler-2:08780] [[42202,0],0] posting recv
>>>>>>>>>>>>>>>>> [compiler-2:08780] [[42202,0],0] posting persistent recv on 
>>>>>>>>>>>>>>>>> tag 2 for peer [[WILDCARD],WILDCARD]
>>>>>>>>>>>>>>>>> [compiler-2:08780] [[42202,0],0] posting recv
>>>>>>>>>>>>>>>>> [compiler-2:08780] [[42202,0],0] posting persistent recv on 
>>>>>>>>>>>>>>>>> tag 21 for peer [[WILDCARD],WILDCARD]
>>>>>>>>>>>>>>>>> [compiler-2:08780] [[42202,0],0] posting recv
>>>>>>>>>>>>>>>>> [compiler-2:08780] [[42202,0],0] posting persistent recv on 
>>>>>>>>>>>>>>>>> tag 22 for peer [[WILDCARD],WILDCARD]
>>>>>>>>>>>>>>>>> [compiler-2:08780] [[42202,0],0] posting recv
>>>>>>>>>>>>>>>>> [compiler-2:08780] [[42202,0],0] posting persistent recv on 
>>>>>>>>>>>>>>>>> tag 45 for peer [[WILDCARD],WILDCARD]
>>>>>>>>>>>>>>>>> [compiler-2:08780] [[42202,0],0] posting recv
>>>>>>>>>>>>>>>>> [compiler-2:08780] [[42202,0],0] posting persistent recv on 
>>>>>>>>>>>>>>>>> tag 46 for peer [[WILDCARD],WILDCARD]
>>>>>>>>>>>>>>>>> [compiler-2:08780] [[42202,0],0] posting recv
>>>>>>>>>>>>>>>>> [compiler-2:08780] [[42202,0],0] posting persistent recv on 
>>>>>>>>>>>>>>>>> tag 1 for peer [[WILDCARD],WILDCARD]
>>>>>>>>>>>>>>>>> [compiler-2:08780] [[42202,0],0] posting recv
>>>>>>>>>>>>>>>>> [compiler-2:08780] [[42202,0],0] posting persistent recv on 
>>>>>>>>>>>>>>>>> tag 27 for peer [[WILDCARD],WILDCARD]
>>>>>>>>>>>>>>>>> Daemon was launched on node1-130-08 - beginning to initialize
>>>>>>>>>>>>>>>>> Daemon was launched on node1-130-03 - beginning to initialize
>>>>>>>>>>>>>>>>> Daemon was launched on node1-130-05 - beginning to initialize
>>>>>>>>>>>>>>>>> Daemon was launched on node1-130-02 - beginning to initialize
>>>>>>>>>>>>>>>>> Daemon was launched on node1-130-01 - beginning to initialize
>>>>>>>>>>>>>>>>> Daemon was launched on node1-130-04 - beginning to initialize
>>>>>>>>>>>>>>>>> Daemon was launched on node1-130-07 - beginning to initialize
>>>>>>>>>>>>>>>>> Daemon was launched on node1-130-06 - beginning to initialize
>>>>>>>>>>>>>>>>> Daemon [[42202,0],3] checking in as pid 7178 on host 
>>>>>>>>>>>>>>>>> node1-130-03
>>>>>>>>>>>>>>>>> [node1-130-03:07178] [[42202,0],3] orted: up and running - 
>>>>>>>>>>>>>>>>> waiting for commands!
>>>>>>>>>>>>>>>>> Daemon [[42202,0],2] checking in as pid 13581 on host 
>>>>>>>>>>>>>>>>> node1-130-02
>>>>>>>>>>>>>>>>> [node1-130-02:13581] [[42202,0],2] orted: up and running - 
>>>>>>>>>>>>>>>>> waiting for commands!
>>>>>>>>>>>>>>>>> Daemon [[42202,0],1] checking in as pid 17220 on host 
>>>>>>>>>>>>>>>>> node1-130-01
>>>>>>>>>>>>>>>>> [node1-130-01:17220] [[42202,0],1] orted: up and running - 
>>>>>>>>>>>>>>>>> waiting for commands!
>>>>>>>>>>>>>>>>> Daemon [[42202,0],5] checking in as pid 6663 on host 
>>>>>>>>>>>>>>>>> node1-130-05
>>>>>>>>>>>>>>>>> [node1-130-05:06663] [[42202,0],5] orted: up and running - 
>>>>>>>>>>>>>>>>> waiting for commands!
>>>>>>>>>>>>>>>>> Daemon [[42202,0],8] checking in as pid 6683 on host 
>>>>>>>>>>>>>>>>> node1-130-08
>>>>>>>>>>>>>>>>> [node1-130-08:06683] [[42202,0],8] orted: up and running - 
>>>>>>>>>>>>>>>>> waiting for commands!
>>>>>>>>>>>>>>>>> Daemon [[42202,0],7] checking in as pid 7877 on host 
>>>>>>>>>>>>>>>>> node1-130-07
>>>>>>>>>>>>>>>>> [node1-130-07:07877] [[42202,0],7] orted: up and running - 
>>>>>>>>>>>>>>>>> waiting for commands!
>>>>>>>>>>>>>>>>> Daemon [[42202,0],4] checking in as pid 7735 on host 
>>>>>>>>>>>>>>>>> node1-130-04
>>>>>>>>>>>>>>>>> [node1-130-04:07735] [[42202,0],4] orted: up and running - 
>>>>>>>>>>>>>>>>> waiting for commands!
>>>>>>>>>>>>>>>>> Daemon [[42202,0],6] checking in as pid 8451 on host 
>>>>>>>>>>>>>>>>> node1-130-06
>>>>>>>>>>>>>>>>> [node1-130-06:08451] [[42202,0],6] orted: up and running - 
>>>>>>>>>>>>>>>>> waiting for commands!
>>>>>>>>>>>>>>>>> srun: error: node1-130-03: task 2: Exited with exit code 1
>>>>>>>>>>>>>>>>> srun: Terminating job step 657040.1
>>>>>>>>>>>>>>>>> srun: error: node1-130-02: task 1: Exited with exit code 1
>>>>>>>>>>>>>>>>> slurmd[node1-130-04]: *** STEP 657040.1 KILLED AT 
>>>>>>>>>>>>>>>>> 2014-08-12T12:59:07 WITH SIGNAL 9 ***
>>>>>>>>>>>>>>>>> slurmd[node1-130-07]: *** STEP 657040.1 KILLED AT 
>>>>>>>>>>>>>>>>> 2014-08-12T12:59:07 WITH SIGNAL 9 ***
>>>>>>>>>>>>>>>>> slurmd[node1-130-06]: *** STEP 657040.1 KILLED AT 
>>>>>>>>>>>>>>>>> 2014-08-12T12:59:07 WITH SIGNAL 9 ***
>>>>>>>>>>>>>>>>> srun: Job step aborted: Waiting up to 2 seconds for job step 
>>>>>>>>>>>>>>>>> to finish.
>>>>>>>>>>>>>>>>> srun: error: node1-130-01: task 0: Exited with exit code 1
>>>>>>>>>>>>>>>>> srun: error: node1-130-05: task 4: Exited with exit code 1
>>>>>>>>>>>>>>>>> srun: error: node1-130-08: task 7: Exited with exit code 1
>>>>>>>>>>>>>>>>> srun: error: node1-130-07: task 6: Exited with exit code 1
>>>>>>>>>>>>>>>>> srun: error: node1-130-04: task 3: Killed
>>>>>>>>>>>>>>>>> srun: error: node1-130-06: task 5: Killed
>>>>>>>>>>>>>>>>> --------------------------------------------------------------------------
>>>>>>>>>>>>>>>>> An ORTE daemon has unexpectedly failed after launch and before
>>>>>>>>>>>>>>>>> communicating back to mpirun. This could be caused by a number
>>>>>>>>>>>>>>>>> of factors, including an inability to create a connection back
>>>>>>>>>>>>>>>>> to mpirun due to a lack of common network interfaces and/or no
>>>>>>>>>>>>>>>>> route found between them. Please check network connectivity
>>>>>>>>>>>>>>>>> (including firewalls and network routing requirements).
>>>>>>>>>>>>>>>>> --------------------------------------------------------------------------
>>>>>>>>>>>>>>>>> [compiler-2:08780] [[42202,0],0] orted_cmd: received halt_vm 
>>>>>>>>>>>>>>>>> cmd
>>>>>>>>>>>>>>>>> [compiler-2:08780] mca: base: close: component oob closed
>>>>>>>>>>>>>>>>> [compiler-2:08780] mca: base: close: unloading component oob
>>>>>>>>>>>>>>>>> [compiler-2:08780] [[42202,0],0] TCP SHUTDOWN
>>>>>>>>>>>>>>>>> [compiler-2:08780] mca: base: close: component tcp closed
>>>>>>>>>>>>>>>>> [compiler-2:08780] mca: base: close: unloading component tcp
>>>>>>>>>>>>>>>>>  
>>>>>>>>>>>>>>>>> _______________________________________________
>>>>>>>>>>>>>>>>> users mailing list
>>>>>>>>>>>>>>>>>   us...@open-mpi.org
>>>>>>>>>>>>>>>>> Subscription:   
>>>>>>>>>>>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>>>>>>>>>>>>>> Link to this post:   
>>>>>>>>>>>>>>>>> http://www.open-mpi.org/community/lists/users/2014/08/24987.php
>>>>>>>>>>>>>>>>>  
>>>>>>>>>>>>>>>>>  
>>>>>>>>>>>>>>>>>  
>>>>>>>>>>>>>>>>> _______________________________________________
>>>>>>>>>>>>>>>>> users mailing list
>>>>>>>>>>>>>>>>>   us...@open-mpi.org
>>>>>>>>>>>>>>>>> Subscription:   
>>>>>>>>>>>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>>>>>>>>>>>>>> Link to this post:   
>>>>>>>>>>>>>>>>> http://www.open-mpi.org/community/lists/users/2014/08/24988.php
>>>>>>>>>>>>>>>>  
>>>>>>>>>>>>>>>>  
>>>>>>>>>>>>>>>> --  
>>>>>>>>>>>>>>>> Jeff Squyres
>>>>>>>>>>>>>>>>   jsquy...@cisco.com
>>>>>>>>>>>>>>>> For corporate legal information go to:   
>>>>>>>>>>>>>>>> http://www.cisco.com/web/about/doing_business/legal/cri/
>>>>>>>>>>>>>>>>  
>>>>>>>>>>>>>>>>  
>>>>>>>>>>>>>>>>  
>>>>>>>>>>>>>>>>  
>>>>>>>>>>>>>>>>  
>>>>>>>>>>>>>>>  
>>>>>>>>>>>>>>>  
>>>>>>>>>>>>>>> --  
>>>>>>>>>>>>>>> Jeff Squyres
>>>>>>>>>>>>>>>   jsquy...@cisco.com
>>>>>>>>>>>>>>> For corporate legal information go to:   
>>>>>>>>>>>>>>> http://www.cisco.com/web/about/doing_business/legal/cri/
>>>>>>>>>>>>>>>  
>>>>>>>>>>>>>>> _______________________________________________
>>>>>>>>>>>>>>> users mailing list
>>>>>>>>>>>>>>>   us...@open-mpi.org
>>>>>>>>>>>>>>> Subscription:   
>>>>>>>>>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>>>>>>>>>>>> Link to this post:   
>>>>>>>>>>>>>>> http://www.open-mpi.org/community/lists/users/2014/08/25001.php
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>--  
>>>>>>>>>>>>>>Jeff Squyres
>>>>>>>>>>>>>>jsquy...@cisco.com
>>>>>>>>>>>>>>For corporate legal information go to:   
>>>>>>>>>>>>>>http://www.cisco.com/web/about/doing_business/legal/cri/
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>_______________________________________________
>>>>>>>>>>>>>users mailing list
>>>>>>>>>>>>>us...@open-mpi.org
>>>>>>>>>>>>>Subscription:   http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>>>>>>>>>>Link to this post:   
>>>>>>>>>>>>>http://www.open-mpi.org/community/lists/users/2014/08/25086.php
>>>>>>>>>>>>
>>>>>>>>>>>>_______________________________________________
>>>>>>>>>>>>users mailing list
>>>>>>>>>>>>us...@open-mpi.org
>>>>>>>>>>>>Subscription:   http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>>>>>>>>>Link to this post:   
>>>>>>>>>>>>http://www.open-mpi.org/community/lists/users/2014/08/25093.php
>>>>>>>>>>>
>>>>>>>>>>>_______________________________________________
>>>>>>>>>>>users mailing list
>>>>>>>>>>>us...@open-mpi.org
>>>>>>>>>>>Subscription:   http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>>>>>>>>Link to this post:   
>>>>>>>>>>>http://www.open-mpi.org/community/lists/users/2014/08/25094.php
>>>>>>>>>>_______________________________________________
>>>>>>>>>>users mailing list
>>>>>>>>>>us...@open-mpi.org
>>>>>>>>>>Subscription:   http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>>>>>>>Link to this post:   
>>>>>>>>>>http://www.open-mpi.org/community/lists/users/2014/08/25095.php
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>_______________________________________________
>>>>>>>>>users mailing list
>>>>>>>>>us...@open-mpi.org
>>>>>>>>>Subscription:   http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>>>>>>Link to this post:   
>>>>>>>>>http://www.open-mpi.org/community/lists/users/2014/08/25105.php
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>_______________________________________________
>>>>>>>users mailing list
>>>>>>>us...@open-mpi.org
>>>>>>>Subscription:   http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>>>>Link to this post:   
>>>>>>>http://www.open-mpi.org/community/lists/users/2014/08/25127.php
>>>>>>
>>>>>>
>>>>>>
>>>>>>--  
>>>>>>
>>>>>>Kind Regards,
>>>>>>
>>>>>>M. _______________________________________________
>>>>>>users mailing list
>>>>>>us...@open-mpi.org
>>>>>>Subscription:   http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>>>Link to this post:   
>>>>>>http://www.open-mpi.org/community/lists/users/2014/08/25128.php
>>>>>_______________________________________________
>>>>>users mailing list
>>>>>us...@open-mpi.org
>>>>>Subscription:   http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>>Link to this post:   
>>>>>http://www.open-mpi.org/community/lists/users/2014/08/25129.php
>>>>
>>>>
>>>>
>>
>>
>>
>>
>>----------------------------------------------------------------------
>>
>>
>>_______________________________________________
>>users mailing list
>>us...@open-mpi.org
>>Subscription:   http://www.open-mpi.org/mailman/listinfo.cgi/users
>>Link to this post:   
>>http://www.open-mpi.org/community/lists/users/2014/08/25154.php




Reply via email to