How bizarre. Please add "--leave-session-attached -mca oob_base_verbose 100" to 
your cmd line

On Aug 27, 2014, at 4:31 AM, Timur Ismagilov <tismagi...@mail.ru> wrote:

> When i try to specify oob with --mca oob_tcp_if_include <one of interface 
> from ifconfig>, i alwase get error:
> 
> $ mpirun  --mca oob_tcp_if_include ib0 -np 1 ./hello_c
> --------------------------------------------------------------------------
> An ORTE daemon has unexpectedly failed after launch and before
> communicating back to mpirun. This could be caused by a number
> of factors, including an inability to create a connection back
> to mpirun due to a lack of common network interfaces and/or no
> route found between them. Please check network connectivity
> (including firewalls and network routing requirements).
> -------------------------------------------------------------------------
> 
> Earlier, in ompi 1.8.1, I can not run mpi jobs without " --mca 
> oob_tcp_if_include ib0 "... but now(ompi 1.9.a1) with this flag i get above 
> error.
> 
> Here is an output of ifconfig
> 
> $ ifconfig
> eth1 Link encap:Ethernet HWaddr 00:15:17:EE:89:E1 
> inet addr:10.0.251.53 Bcast:10.0.251.255 Mask:255.255.255.0
> UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
> RX packets:215087433 errors:0 dropped:0 overruns:0 frame:0
> TX packets:2648 errors:0 dropped:0 overruns:0 carrier:0
> collisions:0 txqueuelen:1000 
> RX bytes:26925754883 (25.0 GiB) TX bytes:137971 (134.7 KiB)
> Memory:b2c00000-b2c20000
> 
> eth2 Link encap:Ethernet HWaddr 00:02:C9:04:73:F8 
> inet addr:10.0.0.4 Bcast:10.0.0.255 Mask:255.255.255.0
> UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
> RX packets:4892833125 errors:0 dropped:0 overruns:0 frame:0
> TX packets:8708606918 errors:0 dropped:0 overruns:0 carrier:0
> collisions:0 txqueuelen:1000 
> RX bytes:1823986502132 (1.6 TiB) TX bytes:11957754120037 (10.8 TiB)
> 
> eth2.911 Link encap:Ethernet HWaddr 00:02:C9:04:73:F8 
> inet addr:93.180.7.38 Bcast:93.180.7.63 Mask:255.255.255.224
> UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
> RX packets:3746454225 errors:0 dropped:0 overruns:0 frame:0
> TX packets:1131917608 errors:0 dropped:3 overruns:0 carrier:0
> collisions:0 txqueuelen:0 
> RX bytes:285174723322 (265.5 GiB) TX bytes:11523163526058 (10.4 TiB)
> 
> eth3 Link encap:Ethernet HWaddr 00:02:C9:04:73:F9 
> inet addr:10.2.251.14 Bcast:10.2.251.255 Mask:255.255.255.0
> UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
> RX packets:591156692 errors:0 dropped:56 overruns:56 frame:56
> TX packets:679729229 errors:0 dropped:0 overruns:0 carrier:0
> collisions:0 txqueuelen:1000 
> RX bytes:324195989293 (301.9 GiB) TX bytes:770299202886 (717.3 GiB)
> 
> Ifconfig uses the ioctl access method to get the full address information, 
> which limits hardware addresses to 8 bytes.
> Because Infiniband address has 20 bytes, only the first 8 bytes are displayed 
> correctly.
> Ifconfig is obsolete! For replacement check ip.
> ib0 Link encap:InfiniBand HWaddr 
> 80:00:00:48:FE:80:00:00:00:00:00:00:00:00:00:00:00:00:00:00 
> inet addr:10.128.0.4 Bcast:10.128.255.255 Mask:255.255.0.0
> UP BROADCAST RUNNING MULTICAST MTU:2044 Metric:1
> RX packets:10843859 errors:0 dropped:0 overruns:0 frame:0
> TX packets:8089839 errors:0 dropped:15 overruns:0 carrier:0
> collisions:0 txqueuelen:1024 
> RX bytes:939249464 (895.7 MiB) TX bytes:886054008 (845.0 MiB)
> 
> lo Link encap:Local Loopback 
> inet addr:127.0.0.1 Mask:255.0.0.0
> UP LOOPBACK RUNNING MTU:16436 Metric:1
> RX packets:31235107 errors:0 dropped:0 overruns:0 frame:0
> TX packets:31235107 errors:0 dropped:0 overruns:0 carrier:0
> collisions:0 txqueuelen:0 
> RX bytes:132750916041 (123.6 GiB) TX bytes:132750916041 (123.6 GiB)
> 
> 
> 
> 
> Tue, 26 Aug 2014 09:48:35 -0700 от Ralph Castain <r...@open-mpi.org>:
> 
> I think something may be messed up with your installation. I went ahead and 
> tested this on a Slurm 2.5.4 cluster, and got the following:
> 
> $ time mpirun -np 1 --host bend001 ./hello
> Hello, World, I am 0 of 1 [0 local peers]: get_cpubind: 0 bitmap 0,12
> 
> real  0m0.086s
> user  0m0.039s
> sys   0m0.046s
> 
> $ time mpirun -np 1 --host bend002 ./hello
> Hello, World, I am 0 of 1 [0 local peers]: get_cpubind: 0 bitmap 0,12
> 
> real  0m0.528s
> user  0m0.021s
> sys   0m0.023s
> 
> Which is what I would have expected. With --host set to the local host, no 
> daemons are being launched and so the time is quite short (just spent mapping 
> and fork/exec). With --host set to a single remote host, you have the time it 
> takes Slurm to launch our daemon on the remote host, so you get about half of 
> a second.
> 
> IIRC, you were having some problems with the OOB setup. If you specify the 
> TCP interface to use, does your time come down?
> 
> 
> On Aug 26, 2014, at 8:32 AM, Timur Ismagilov <tismagi...@mail.ru> wrote:
> 
>> I'm using slurm 2.5.6
>> 
>> $salloc -N8 --exclusive -J ompi -p test
>> 
>> $ srun hostname
>> node1-128-21
>> node1-128-24
>> node1-128-22
>> node1-128-26
>> node1-128-27
>> node1-128-20
>> node1-128-25
>> node1-128-23
>> 
>> $ time mpirun -np 1 --host node1-128-21 ./hello_c
>> Hello, world, I am 0 of 1, (Open MPI v1.9a1, package: Open MPI 
>> semenov@compiler-2 Distribution, ident: 1.9a1r32570, repo rev: r32570, Aug 
>> 21, 2014 (nightly snapshot tarball), 146)
>> 
>> real 1m3.932s
>> user 0m0.035s
>> sys 0m0.072s
>> 
>> 
>> 
>> 
>> Tue, 26 Aug 2014 07:03:58 -0700 от Ralph Castain <r...@open-mpi.org>:
>> hmmm....what is your allocation like? do you have a large hostfile, for 
>> example?
>> 
>> if you add a --host argument that contains just the local host, what is the 
>> time for that scenario?
>> 
>> On Aug 26, 2014, at 6:27 AM, Timur Ismagilov <tismagi...@mail.ru> wrote:
>> 
>>> Hello!
>>> Here is my time results:
>>> 
>>> $time mpirun -n 1 ./hello_c
>>> Hello, world, I am 0 of 1, (Open MPI v1.9a1, package: Open MPI 
>>> semenov@compiler-2 Distribution, ident: 1.9a1r32570, repo rev: r32570, Aug 
>>> 21, 2014 (nightly snapshot tarball), 146)
>>> 
>>> real 1m3.985s
>>> user 0m0.031s
>>> sys 0m0.083s
>>> 
>>> 
>>> 
>>> 
>>> Fri, 22 Aug 2014 07:43:03 -0700 от Ralph Castain <r...@open-mpi.org>:
>>> I'm also puzzled by your timing statement - I can't replicate it:
>>> 
>>> 07:41:43  $ time mpirun -n 1 ./hello_c
>>> Hello, world, I am 0 of 1, (Open MPI v1.9a1, package: Open MPI rhc@bend001 
>>> Distribution, ident: 1.9a1r32577, repo rev: r32577, Unreleased developer 
>>> copy, 125)
>>> 
>>> real        0m0.547s
>>> user        0m0.043s
>>> sys 0m0.046s
>>> 
>>> The entire thing ran in 0.5 seconds
>>> 
>>> 
>>> On Aug 22, 2014, at 6:33 AM, Mike Dubman <mi...@dev.mellanox.co.il> wrote:
>>> 
>>>> Hi,
>>>> The default delimiter is ";" . You can change delimiter with 
>>>> mca_base_env_list_delimiter.
>>>> 
>>>> 
>>>> 
>>>> On Fri, Aug 22, 2014 at 2:59 PM, Timur Ismagilov <tismagi...@mail.ru> 
>>>> wrote:
>>>> Hello!
>>>> If i use latest night snapshot:
>>>> $ ompi_info -V
>>>> Open MPI v1.9a1r32570
>>>> 
>>>> In programm hello_c initialization takes ~1 min
>>>> In ompi 1.8.2rc4 and ealier it takes ~1 sec(or less)
>>>> if i use 
>>>> $mpirun  --mca mca_base_env_list 
>>>> 'MXM_SHM_KCOPY_MODE=off,OMP_NUM_THREADS=8' --map-by slot:pe=8 -np 1 
>>>> ./hello_c
>>>> i got error 
>>>> config_parser.c:657  MXM  ERROR Invalid value for SHM_KCOPY_MODE: 
>>>> 'off,OMP_NUM_THREADS=8'. Expected: [off|knem|cma|autodetect]
>>>> but with -x all works fine (but with warn)
>>>> $mpirun  -x MXM_SHM_KCOPY_MODE=off -x OMP_NUM_THREADS=8 -np 1 ./hello_c
>>>> WARNING: The mechanism by which environment variables are explicitly
>>>> ..............
>>>> ..............
>>>> ..............
>>>> Hello, world, I am 0 of 1, (Open MPI v1.9a1, package: Open MPI 
>>>> semenov@compiler-2 Distribution, ident: 1.9a1r32570, repo rev: r32570, Aug 
>>>> 21, 2014 (nightly snapshot tarball), 146)
>>>> 
>>>> 
>>>> Thu, 21 Aug 2014 06:26:13 -0700 от Ralph Castain <r...@open-mpi.org>:
>>>> Not sure I understand. The problem has been fixed in both the trunk and 
>>>> the 1.8 branch now, so you should be able to work with either of those 
>>>> nightly builds.
>>>> 
>>>> On Aug 21, 2014, at 12:02 AM, Timur Ismagilov <tismagi...@mail.ru> wrote:
>>>> 
>>>>> Have i I any opportunity to run mpi jobs?
>>>>> 
>>>>> 
>>>>> Wed, 20 Aug 2014 10:48:38 -0700 от Ralph Castain <r...@open-mpi.org>:
>>>>> yes, i know - it is cmr'd
>>>>> 
>>>>> On Aug 20, 2014, at 10:26 AM, Mike Dubman <mi...@dev.mellanox.co.il> 
>>>>> wrote:
>>>>> 
>>>>>> btw, we get same error in v1.8 branch as well.
>>>>>> 
>>>>>> 
>>>>>> On Wed, Aug 20, 2014 at 8:06 PM, Ralph Castain <r...@open-mpi.org> wrote:
>>>>>> It was not yet fixed - but should be now.
>>>>>> 
>>>>>> On Aug 20, 2014, at 6:39 AM, Timur Ismagilov <tismagi...@mail.ru> wrote:
>>>>>> 
>>>>>>> Hello!
>>>>>>> 
>>>>>>> As i can see, the bug is fixed, but in Open MPI v1.9a1r32516  i still 
>>>>>>> have the problem
>>>>>>> 
>>>>>>> a)
>>>>>>> $ mpirun  -np 1 ./hello_c
>>>>>>> 
>>>>>>> --------------------------------------------------------------------------
>>>>>>> An ORTE daemon has unexpectedly failed after launch and before
>>>>>>> communicating back to mpirun. This could be caused by a number
>>>>>>> of factors, including an inability to create a connection back
>>>>>>> to mpirun due to a lack of common network interfaces and/or no
>>>>>>> route found between them. Please check network connectivity
>>>>>>> (including firewalls and network routing requirements).
>>>>>>> --------------------------------------------------------------------------
>>>>>>> 
>>>>>>> b)
>>>>>>> $ mpirun --mca oob_tcp_if_include ib0 -np 1 ./hello_c
>>>>>>> --------------------------------------------------------------------------
>>>>>>> An ORTE daemon has unexpectedly failed after launch and before
>>>>>>> communicating back to mpirun. This could be caused by a number
>>>>>>> of factors, including an inability to create a connection back
>>>>>>> to mpirun due to a lack of common network interfaces and/or no
>>>>>>> route found between them. Please check network connectivity
>>>>>>> (including firewalls and network routing requirements).
>>>>>>> --------------------------------------------------------------------------
>>>>>>> 
>>>>>>> c)
>>>>>>> 
>>>>>>> $ mpirun --mca oob_tcp_if_include ib0 -debug-daemons --mca 
>>>>>>> plm_base_verbose 5 -mca oob_base_verbose 10 -mca rml_base_verbose 10 
>>>>>>> -np 1 ./hello_c
>>>>>>> 
>>>>>>> [compiler-2:14673] mca:base:select:( plm) Querying component [isolated]
>>>>>>> [compiler-2:14673] mca:base:select:( plm) Query of component [isolated] 
>>>>>>> set priority to 0
>>>>>>> [compiler-2:14673] mca:base:select:( plm) Querying component [rsh]
>>>>>>> [compiler-2:14673] mca:base:select:( plm) Query of component [rsh] set 
>>>>>>> priority to 10
>>>>>>> [compiler-2:14673] mca:base:select:( plm) Querying component [slurm]
>>>>>>> [compiler-2:14673] mca:base:select:( plm) Query of component [slurm] 
>>>>>>> set priority to 75
>>>>>>> [compiler-2:14673] mca:base:select:( plm) Selected component [slurm]
>>>>>>> [compiler-2:14673] mca: base: components_register: registering oob 
>>>>>>> components
>>>>>>> [compiler-2:14673] mca: base: components_register: found loaded 
>>>>>>> component tcp
>>>>>>> [compiler-2:14673] mca: base: components_register: component tcp 
>>>>>>> register function successful
>>>>>>> [compiler-2:14673] mca: base: components_open: opening oob components
>>>>>>> [compiler-2:14673] mca: base: components_open: found loaded component 
>>>>>>> tcp
>>>>>>> [compiler-2:14673] mca: base: components_open: component tcp open 
>>>>>>> function successful
>>>>>>> [compiler-2:14673] mca:oob:select: checking available component tcp
>>>>>>> [compiler-2:14673] mca:oob:select: Querying component [tcp]
>>>>>>> [compiler-2:14673] oob:tcp: component_available called
>>>>>>> [compiler-2:14673] WORKING INTERFACE 1 KERNEL INDEX 1 FAMILY: V4
>>>>>>> [compiler-2:14673] WORKING INTERFACE 2 KERNEL INDEX 3 FAMILY: V4
>>>>>>> [compiler-2:14673] WORKING INTERFACE 3 KERNEL INDEX 4 FAMILY: V4
>>>>>>> [compiler-2:14673] WORKING INTERFACE 4 KERNEL INDEX 5 FAMILY: V4
>>>>>>> [compiler-2:14673] WORKING INTERFACE 5 KERNEL INDEX 6 FAMILY: V4
>>>>>>> [compiler-2:14673] [[49095,0],0] oob:tcp:init adding 10.128.0.4 to our 
>>>>>>> list of V4 connections
>>>>>>> [compiler-2:14673] WORKING INTERFACE 6 KERNEL INDEX 7 FAMILY: V4
>>>>>>> [compiler-2:14673] [[49095,0],0] TCP STARTUP
>>>>>>> [compiler-2:14673] [[49095,0],0] attempting to bind to IPv4 port 0
>>>>>>> [compiler-2:14673] [[49095,0],0] assigned IPv4 port 59460
>>>>>>> [compiler-2:14673] mca:oob:select: Adding component to end
>>>>>>> [compiler-2:14673] mca:oob:select: Found 1 active transports
>>>>>>> [compiler-2:14673] mca: base: components_register: registering rml 
>>>>>>> components
>>>>>>> [compiler-2:14673] mca: base: components_register: found loaded 
>>>>>>> component oob
>>>>>>> [compiler-2:14673] mca: base: components_register: component oob has no 
>>>>>>> register or open function
>>>>>>> [compiler-2:14673] mca: base: components_open: opening rml components
>>>>>>> [compiler-2:14673] mca: base: components_open: found loaded component 
>>>>>>> oob
>>>>>>> [compiler-2:14673] mca: base: components_open: component oob open 
>>>>>>> function successful
>>>>>>> [compiler-2:14673] orte_rml_base_select: initializing rml component oob
>>>>>>> [compiler-2:14673] [[49095,0],0] posting recv
>>>>>>> [compiler-2:14673] [[49095,0],0] posting persistent recv on tag 30 for 
>>>>>>> peer [[WILDCARD],WILDCARD]
>>>>>>> [compiler-2:14673] [[49095,0],0] posting recv
>>>>>>> [compiler-2:14673] [[49095,0],0] posting persistent recv on tag 15 for 
>>>>>>> peer [[WILDCARD],WILDCARD]
>>>>>>> [compiler-2:14673] [[49095,0],0] posting recv
>>>>>>> [compiler-2:14673] [[49095,0],0] posting persistent recv on tag 32 for 
>>>>>>> peer [[WILDCARD],WILDCARD]
>>>>>>> [compiler-2:14673] [[49095,0],0] posting recv
>>>>>>> [compiler-2:14673] [[49095,0],0] posting persistent recv on tag 33 for 
>>>>>>> peer [[WILDCARD],WILDCARD]
>>>>>>> [compiler-2:14673] [[49095,0],0] posting recv
>>>>>>> [compiler-2:14673] [[49095,0],0] posting persistent recv on tag 5 for 
>>>>>>> peer [[WILDCARD],WILDCARD]
>>>>>>> [compiler-2:14673] [[49095,0],0] posting recv
>>>>>>> [compiler-2:14673] [[49095,0],0] posting persistent recv on tag 10 for 
>>>>>>> peer [[WILDCARD],WILDCARD]
>>>>>>> [compiler-2:14673] [[49095,0],0] posting recv
>>>>>>> [compiler-2:14673] [[49095,0],0] posting persistent recv on tag 12 for 
>>>>>>> peer [[WILDCARD],WILDCARD]
>>>>>>> [compiler-2:14673] [[49095,0],0] posting recv
>>>>>>> [compiler-2:14673] [[49095,0],0] posting persistent recv on tag 9 for 
>>>>>>> peer [[WILDCARD],WILDCARD]
>>>>>>> [compiler-2:14673] [[49095,0],0] posting recv
>>>>>>> [compiler-2:14673] [[49095,0],0] posting persistent recv on tag 34 for 
>>>>>>> peer [[WILDCARD],WILDCARD]
>>>>>>> [compiler-2:14673] [[49095,0],0] posting recv
>>>>>>> [compiler-2:14673] [[49095,0],0] posting persistent recv on tag 2 for 
>>>>>>> peer [[WILDCARD],WILDCARD]
>>>>>>> [compiler-2:14673] [[49095,0],0] posting recv
>>>>>>> [compiler-2:14673] [[49095,0],0] posting persistent recv on tag 21 for 
>>>>>>> peer [[WILDCARD],WILDCARD]
>>>>>>> [compiler-2:14673] [[49095,0],0] posting recv
>>>>>>> [compiler-2:14673] [[49095,0],0] posting persistent recv on tag 22 for 
>>>>>>> peer [[WILDCARD],WILDCARD]
>>>>>>> [compiler-2:14673] [[49095,0],0] posting recv
>>>>>>> [compiler-2:14673] [[49095,0],0] posting persistent recv on tag 45 for 
>>>>>>> peer [[WILDCARD],WILDCARD]
>>>>>>> [compiler-2:14673] [[49095,0],0] posting recv
>>>>>>> [compiler-2:14673] [[49095,0],0] posting persistent recv on tag 46 for 
>>>>>>> peer [[WILDCARD],WILDCARD]
>>>>>>> [compiler-2:14673] [[49095,0],0] posting recv
>>>>>>> [compiler-2:14673] [[49095,0],0] posting persistent recv on tag 1 for 
>>>>>>> peer [[WILDCARD],WILDCARD]
>>>>>>> [compiler-2:14673] [[49095,0],0] posting recv
>>>>>>> [compiler-2:14673] [[49095,0],0] posting persistent recv on tag 27 for 
>>>>>>> peer [[WILDCARD],WILDCARD]
>>>>>>> Daemon was launched on node1-128-01 - beginning to initialize
>>>>>>> --------------------------------------------------------------------------
>>>>>>> WARNING: An invalid value was given for oob_tcp_if_include. This
>>>>>>> value will be ignored.
>>>>>>> 
>>>>>>> Local host: node1-128-01
>>>>>>> Value: "ib0"
>>>>>>> Message: Invalid specification (missing "/")
>>>>>>> --------------------------------------------------------------------------
>>>>>>> --------------------------------------------------------------------------
>>>>>>> None of the TCP networks specified to be included for out-of-band 
>>>>>>> communications
>>>>>>> could be found:
>>>>>>> 
>>>>>>> Value given:
>>>>>>> 
>>>>>>> Please revise the specification and try again.
>>>>>>> --------------------------------------------------------------------------
>>>>>>> --------------------------------------------------------------------------
>>>>>>> No network interfaces were found for out-of-band communications. We 
>>>>>>> require
>>>>>>> at least one available network for out-of-band messaging.
>>>>>>> --------------------------------------------------------------------------
>>>>>>> --------------------------------------------------------------------------
>>>>>>> It looks like orte_init failed for some reason; your parallel process is
>>>>>>> likely to abort. There are many reasons that a parallel process can
>>>>>>> fail during orte_init; some of which are due to configuration or
>>>>>>> environment problems. This failure appears to be an internal failure;
>>>>>>> here's some additional information (which may only be relevant to an
>>>>>>> Open MPI developer):
>>>>>>> 
>>>>>>> orte_oob_base_select failed
>>>>>>> --> Returned value (null) (-43) instead of ORTE_SUCCESS
>>>>>>> --------------------------------------------------------------------------
>>>>>>> srun: error: node1-128-01: task 0: Exited with exit code 213
>>>>>>> srun: Terminating job step 661215.0
>>>>>>> --------------------------------------------------------------------------
>>>>>>> An ORTE daemon has unexpectedly failed after launch and before
>>>>>>> communicating back to mpirun. This could be caused by a number
>>>>>>> of factors, including an inability to create a connection back
>>>>>>> to mpirun due to a lack of common network interfaces and/or no
>>>>>>> route found between them. Please check network connectivity
>>>>>>> (including firewalls and network routing requirements).
>>>>>>> --------------------------------------------------------------------------
>>>>>>> [compiler-2:14673] [[49095,0],0] orted_cmd: received halt_vm cmd
>>>>>>> [compiler-2:14673] mca: base: close: component oob closed
>>>>>>> [compiler-2:14673] mca: base: close: unloading component oob
>>>>>>> [compiler-2:14673] [[49095,0],0] TCP SHUTDOWN
>>>>>>> [compiler-2:14673] mca: base: close: component tcp closed
>>>>>>> [compiler-2:14673] mca: base: close: unloading component tcp
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> Tue, 12 Aug 2014 18:33:24 +0000 от "Jeff Squyres (jsquyres)" 
>>>>>>> <jsquy...@cisco.com>:
>>>>>>> I filed the following ticket:
>>>>>>> 
>>>>>>>     https://svn.open-mpi.org/trac/ompi/ticket/4857
>>>>>>> 
>>>>>>> 
>>>>>>> On Aug 12, 2014, at 12:39 PM, Jeff Squyres (jsquyres) 
>>>>>>> <jsquy...@cisco.com> wrote:
>>>>>>> 
>>>>>>> > (please keep the users list CC'ed)
>>>>>>> > 
>>>>>>> > We talked about this on the weekly engineering call today. Ralph has 
>>>>>>> > an idea what is happening -- I need to do a little investigation 
>>>>>>> > today and file a bug. I'll make sure you're CC'ed on the bug ticket.
>>>>>>> > 
>>>>>>> > 
>>>>>>> > 
>>>>>>> > On Aug 12, 2014, at 12:27 PM, Timur Ismagilov <tismagi...@mail.ru> 
>>>>>>> > wrote:
>>>>>>> > 
>>>>>>> >> I don't have this error in OMPI 1.9a1r32252 and OMPI 1.8.1 (with 
>>>>>>> >> --mca oob_tcp_if_include ib0), but in all latest night snapshots i 
>>>>>>> >> got this error.
>>>>>>> >> 
>>>>>>> >> 
>>>>>>> >> Tue, 12 Aug 2014 13:08:12 +0000 от "Jeff Squyres (jsquyres)" 
>>>>>>> >> <jsquy...@cisco.com>:
>>>>>>> >> Are you running any kind of firewall on the node where mpirun is 
>>>>>>> >> invoked? Open MPI needs to be able to use arbitrary TCP ports 
>>>>>>> >> between the servers on which it runs.
>>>>>>> >> 
>>>>>>> >> This second mail seems to imply a bug in OMPI's oob_tcp_if_include 
>>>>>>> >> param handling, however -- it's supposed to be able to handle an 
>>>>>>> >> interface name (not just a network specification).
>>>>>>> >> 
>>>>>>> >> Ralph -- can you have a look?
>>>>>>> >> 
>>>>>>> >> 
>>>>>>> >> On Aug 12, 2014, at 8:41 AM, Timur Ismagilov <tismagi...@mail.ru> 
>>>>>>> >> wrote:
>>>>>>> >> 
>>>>>>> >>> When i add --mca oob_tcp_if_include ib0 (infiniband interface) to 
>>>>>>> >>> mpirun (as it was here: 
>>>>>>> >>> http://www.open-mpi.org/community/lists/users/2014/07/24857.php ) i 
>>>>>>> >>> got this output:
>>>>>>> >>> 
>>>>>>> >>> [compiler-2:08792] mca:base:select:( plm) Querying component 
>>>>>>> >>> [isolated]
>>>>>>> >>> [compiler-2:08792] mca:base:select:( plm) Query of component 
>>>>>>> >>> [isolated] set priority to 0
>>>>>>> >>> [compiler-2:08792] mca:base:select:( plm) Querying component [rsh]
>>>>>>> >>> [compiler-2:08792] mca:base:select:( plm) Query of component [rsh] 
>>>>>>> >>> set priority to 10
>>>>>>> >>> [compiler-2:08792] mca:base:select:( plm) Querying component [slurm]
>>>>>>> >>> [compiler-2:08792] mca:base:select:( plm) Query of component 
>>>>>>> >>> [slurm] set priority to 75
>>>>>>> >>> [compiler-2:08792] mca:base:select:( plm) Selected component [slurm]
>>>>>>> >>> [compiler-2:08792] mca: base: components_register: registering oob 
>>>>>>> >>> components
>>>>>>> >>> [compiler-2:08792] mca: base: components_register: found loaded 
>>>>>>> >>> component tcp
>>>>>>> >>> [compiler-2:08792] mca: base: components_register: component tcp 
>>>>>>> >>> register function successful
>>>>>>> >>> [compiler-2:08792] mca: base: components_open: opening oob 
>>>>>>> >>> components
>>>>>>> >>> [compiler-2:08792] mca: base: components_open: found loaded 
>>>>>>> >>> component tcp
>>>>>>> >>> [compiler-2:08792] mca: base: components_open: component tcp open 
>>>>>>> >>> function successful
>>>>>>> >>> [compiler-2:08792] mca:oob:select: checking available component tcp
>>>>>>> >>> [compiler-2:08792] mca:oob:select: Querying component [tcp]
>>>>>>> >>> [compiler-2:08792] oob:tcp: component_available called
>>>>>>> >>> [compiler-2:08792] WORKING INTERFACE 1 KERNEL INDEX 1 FAMILY: V4
>>>>>>> >>> [compiler-2:08792] WORKING INTERFACE 2 KERNEL INDEX 3 FAMILY: V4
>>>>>>> >>> [compiler-2:08792] WORKING INTERFACE 3 KERNEL INDEX 4 FAMILY: V4
>>>>>>> >>> [compiler-2:08792] WORKING INTERFACE 4 KERNEL INDEX 5 FAMILY: V4
>>>>>>> >>> [compiler-2:08792] WORKING INTERFACE 5 KERNEL INDEX 6 FAMILY: V4
>>>>>>> >>> [compiler-2:08792] [[42190,0],0] oob:tcp:init adding 10.128.0.4 to 
>>>>>>> >>> our list of V4 connections
>>>>>>> >>> [compiler-2:08792] WORKING INTERFACE 6 KERNEL INDEX 7 FAMILY: V4
>>>>>>> >>> [compiler-2:08792] [[42190,0],0] TCP STARTUP
>>>>>>> >>> [compiler-2:08792] [[42190,0],0] attempting to bind to IPv4 port 0
>>>>>>> >>> [compiler-2:08792] [[42190,0],0] assigned IPv4 port 53883
>>>>>>> >>> [compiler-2:08792] mca:oob:select: Adding component to end
>>>>>>> >>> [compiler-2:08792] mca:oob:select: Found 1 active transports
>>>>>>> >>> [compiler-2:08792] mca: base: components_register: registering rml 
>>>>>>> >>> components
>>>>>>> >>> [compiler-2:08792] mca: base: components_register: found loaded 
>>>>>>> >>> component oob
>>>>>>> >>> [compiler-2:08792] mca: base: components_register: component oob 
>>>>>>> >>> has no register or open function
>>>>>>> >>> [compiler-2:08792] mca: base: components_open: opening rml 
>>>>>>> >>> components
>>>>>>> >>> [compiler-2:08792] mca: base: components_open: found loaded 
>>>>>>> >>> component oob
>>>>>>> >>> [compiler-2:08792] mca: base: components_open: component oob open 
>>>>>>> >>> function successful
>>>>>>> >>> [compiler-2:08792] orte_rml_base_select: initializing rml component 
>>>>>>> >>> oob
>>>>>>> >>> [compiler-2:08792] [[42190,0],0] posting recv
>>>>>>> >>> [compiler-2:08792] [[42190,0],0] posting persistent recv on tag 30 
>>>>>>> >>> for peer [[WILDCARD],WILDCARD]
>>>>>>> >>> [compiler-2:08792] [[42190,0],0] posting recv
>>>>>>> >>> [compiler-2:08792] [[42190,0],0] posting persistent recv on tag 15 
>>>>>>> >>> for peer [[WILDCARD],WILDCARD]
>>>>>>> >>> [compiler-2:08792] [[42190,0],0] posting recv
>>>>>>> >>> [compiler-2:08792] [[42190,0],0] posting persistent recv on tag 32 
>>>>>>> >>> for peer [[WILDCARD],WILDCARD]
>>>>>>> >>> [compiler-2:08792] [[42190,0],0] posting recv
>>>>>>> >>> [compiler-2:08792] [[42190,0],0] posting persistent recv on tag 33 
>>>>>>> >>> for peer [[WILDCARD],WILDCARD]
>>>>>>> >>> [compiler-2:08792] [[42190,0],0] posting recv
>>>>>>> >>> [compiler-2:08792] [[42190,0],0] posting persistent recv on tag 5 
>>>>>>> >>> for peer [[WILDCARD],WILDCARD]
>>>>>>> >>> [compiler-2:08792] [[42190,0],0] posting recv
>>>>>>> >>> [compiler-2:08792] [[42190,0],0] posting persistent recv on tag 10 
>>>>>>> >>> for peer [[WILDCARD],WILDCARD]
>>>>>>> >>> [compiler-2:08792] [[42190,0],0] posting recv
>>>>>>> >>> [compiler-2:08792] [[42190,0],0] posting persistent recv on tag 12 
>>>>>>> >>> for peer [[WILDCARD],WILDCARD]
>>>>>>> >>> [compiler-2:08792] [[42190,0],0] posting recv
>>>>>>> >>> [compiler-2:08792] [[42190,0],0] posting persistent recv on tag 9 
>>>>>>> >>> for peer [[WILDCARD],WILDCARD]
>>>>>>> >>> [compiler-2:08792] [[42190,0],0] posting recv
>>>>>>> >>> [compiler-2:08792] [[42190,0],0] posting persistent recv on tag 34 
>>>>>>> >>> for peer [[WILDCARD],WILDCARD]
>>>>>>> >>> [compiler-2:08792] [[42190,0],0] posting recv
>>>>>>> >>> [compiler-2:08792] [[42190,0],0] posting persistent recv on tag 2 
>>>>>>> >>> for peer [[WILDCARD],WILDCARD]
>>>>>>> >>> [compiler-2:08792] [[42190,0],0] posting recv
>>>>>>> >>> [compiler-2:08792] [[42190,0],0] posting persistent recv on tag 21 
>>>>>>> >>> for peer [[WILDCARD],WILDCARD]
>>>>>>> >>> [compiler-2:08792] [[42190,0],0] posting recv
>>>>>>> >>> [compiler-2:08792] [[42190,0],0] posting persistent recv on tag 22 
>>>>>>> >>> for peer [[WILDCARD],WILDCARD]
>>>>>>> >>> [compiler-2:08792] [[42190,0],0] posting recv
>>>>>>> >>> [compiler-2:08792] [[42190,0],0] posting persistent recv on tag 45 
>>>>>>> >>> for peer [[WILDCARD],WILDCARD]
>>>>>>> >>> [compiler-2:08792] [[42190,0],0] posting recv
>>>>>>> >>> [compiler-2:08792] [[42190,0],0] posting persistent recv on tag 46 
>>>>>>> >>> for peer [[WILDCARD],WILDCARD]
>>>>>>> >>> [compiler-2:08792] [[42190,0],0] posting recv
>>>>>>> >>> [compiler-2:08792] [[42190,0],0] posting persistent recv on tag 1 
>>>>>>> >>> for peer [[WILDCARD],WILDCARD]
>>>>>>> >>> [compiler-2:08792] [[42190,0],0] posting recv
>>>>>>> >>> [compiler-2:08792] [[42190,0],0] posting persistent recv on tag 27 
>>>>>>> >>> for peer [[WILDCARD],WILDCARD]
>>>>>>> >>> Daemon was launched on node1-128-01 - beginning to initialize
>>>>>>> >>> Daemon was launched on node1-128-02 - beginning to initialize
>>>>>>> >>> --------------------------------------------------------------------------
>>>>>>> >>> WARNING: An invalid value was given for oob_tcp_if_include. This
>>>>>>> >>> value will be ignored.
>>>>>>> >>> 
>>>>>>> >>> Local host: node1-128-01
>>>>>>> >>> Value: "ib0"
>>>>>>> >>> Message: Invalid specification (missing "/")
>>>>>>> >>> --------------------------------------------------------------------------
>>>>>>> >>> --------------------------------------------------------------------------
>>>>>>> >>> WARNING: An invalid value was given for oob_tcp_if_include. This
>>>>>>> >>> value will be ignored.
>>>>>>> >>> 
>>>>>>> >>> Local host: node1-128-02
>>>>>>> >>> Value: "ib0"
>>>>>>> >>> Message: Invalid specification (missing "/")
>>>>>>> >>> --------------------------------------------------------------------------
>>>>>>> >>> --------------------------------------------------------------------------
>>>>>>> >>> None of the TCP networks specified to be included for out-of-band 
>>>>>>> >>> communications
>>>>>>> >>> could be found:
>>>>>>> >>> 
>>>>>>> >>> Value given:
>>>>>>> >>> 
>>>>>>> >>> Please revise the specification and try again.
>>>>>>> >>> --------------------------------------------------------------------------
>>>>>>> >>> --------------------------------------------------------------------------
>>>>>>> >>> None of the TCP networks specified to be included for out-of-band 
>>>>>>> >>> communications
>>>>>>> >>> could be found:
>>>>>>> >>> 
>>>>>>> >>> Value given:
>>>>>>> >>> 
>>>>>>> >>> Please revise the specification and try again.
>>>>>>> >>> --------------------------------------------------------------------------
>>>>>>> >>> --------------------------------------------------------------------------
>>>>>>> >>> No network interfaces were found for out-of-band communications. We 
>>>>>>> >>> require
>>>>>>> >>> at least one available network for out-of-band messaging.
>>>>>>> >>> --------------------------------------------------------------------------
>>>>>>> >>> --------------------------------------------------------------------------
>>>>>>> >>> No network interfaces were found for out-of-band communications. We 
>>>>>>> >>> require
>>>>>>> >>> at least one available network for out-of-band messaging.
>>>>>>> >>> --------------------------------------------------------------------------
>>>>>>> >>> --------------------------------------------------------------------------
>>>>>>> >>> It looks like orte_init failed for some reason; your parallel 
>>>>>>> >>> process is
>>>>>>> >>> likely to abort. There are many reasons that a parallel process can
>>>>>>> >>> fail during orte_init; some of which are due to configuration or
>>>>>>> >>> environment problems. This failure appears to be an internal 
>>>>>>> >>> failure;
>>>>>>> >>> here's some additional information (which may only be relevant to an
>>>>>>> >>> Open MPI developer):
>>>>>>> >>> 
>>>>>>> >>> orte_oob_base_select failed
>>>>>>> >>> --> Returned value (null) (-43) instead of ORTE_SUCCESS
>>>>>>> >>> --------------------------------------------------------------------------
>>>>>>> >>> --------------------------------------------------------------------------
>>>>>>> >>> It looks like orte_init failed for some reason; your parallel 
>>>>>>> >>> process is
>>>>>>> >>> likely to abort. There are many reasons that a parallel process can
>>>>>>> >>> fail during orte_init; some of which are due to configuration or
>>>>>>> >>> environment problems. This failure appears to be an internal 
>>>>>>> >>> failure;
>>>>>>> >>> here's some additional information (which may only be relevant to an
>>>>>>> >>> Open MPI developer):
>>>>>>> >>> 
>>>>>>> >>> orte_oob_base_select failed
>>>>>>> >>> --> Returned value (null) (-43) instead of ORTE_SUCCESS
>>>>>>> >>> --------------------------------------------------------------------------
>>>>>>> >>> srun: error: node1-128-02: task 1: Exited with exit code 213
>>>>>>> >>> srun: Terminating job step 657300.0
>>>>>>> >>> srun: error: node1-128-01: task 0: Exited with exit code 213
>>>>>>> >>> --------------------------------------------------------------------------
>>>>>>> >>> An ORTE daemon has unexpectedly failed after launch and before
>>>>>>> >>> communicating back to mpirun. This could be caused by a number
>>>>>>> >>> of factors, including an inability to create a connection back
>>>>>>> >>> to mpirun due to a lack of common network interfaces and/or no
>>>>>>> >>> route found between them. Please check network connectivity
>>>>>>> >>> (including firewalls and network routing requirements).
>>>>>>> >>> --------------------------------------------------------------------------
>>>>>>> >>> [compiler-2:08792] [[42190,0],0] orted_cmd: received halt_vm cmd
>>>>>>> >>> [compiler-2:08792] mca: base: close: component oob closed
>>>>>>> >>> [compiler-2:08792] mca: base: close: unloading component oob
>>>>>>> >>> [compiler-2:08792] [[42190,0],0] TCP SHUTDOWN
>>>>>>> >>> [compiler-2:08792] mca: base: close: component tcp closed
>>>>>>> >>> [compiler-2:08792] mca: base: close: unloading component tcp
>>>>>>> >>> 
>>>>>>> >>> 
>>>>>>> >>> 
>>>>>>> >>> Tue, 12 Aug 2014 16:14:58 +0400 от Timur Ismagilov 
>>>>>>> >>> <tismagi...@mail.ru>:
>>>>>>> >>> Hello!
>>>>>>> >>> 
>>>>>>> >>> I have Open MPI v1.8.2rc4r32485
>>>>>>> >>> 
>>>>>>> >>> When i run hello_c, I got this error message
>>>>>>> >>> $mpirun -np 2 hello_c
>>>>>>> >>> 
>>>>>>> >>> An ORTE daemon has unexpectedly failed after launch and before
>>>>>>> >>> 
>>>>>>> >>> communicating back to mpirun. This could be caused by a number
>>>>>>> >>> of factors, including an inability to create a connection back
>>>>>>> >>> to mpirun due to a lack of common network interfaces and/or no
>>>>>>> >>> route found between them. Please check network connectivity
>>>>>>> >>> (including firewalls and network routing requirements).
>>>>>>> >>> 
>>>>>>> >>> When i run with --debug-daemons --mca plm_base_verbose 5 -mca 
>>>>>>> >>> oob_base_verbose 10 -mca rml_base_verbose 10 i got this output:
>>>>>>> >>> $mpirun --debug-daemons --mca plm_base_verbose 5 -mca 
>>>>>>> >>> oob_base_verbose 10 -mca rml_base_verbose 10 -np 2 hello_c
>>>>>>> >>> 
>>>>>>> >>> [compiler-2:08780] mca:base:select:( plm) Querying component 
>>>>>>> >>> [isolated]
>>>>>>> >>> [compiler-2:08780] mca:base:select:( plm) Query of component 
>>>>>>> >>> [isolated] set priority to 0
>>>>>>> >>> [compiler-2:08780] mca:base:select:( plm) Querying component [rsh]
>>>>>>> >>> [compiler-2:08780] mca:base:select:( plm) Query of component [rsh] 
>>>>>>> >>> set priority to 10
>>>>>>> >>> [compiler-2:08780] mca:base:select:( plm) Querying component [slurm]
>>>>>>> >>> [compiler-2:08780] mca:base:select:( plm) Query of component 
>>>>>>> >>> [slurm] set priority to 75
>>>>>>> >>> [compiler-2:08780] mca:base:select:( plm) Selected component [slurm]
>>>>>>> >>> [compiler-2:08780] mca: base: components_register: registering oob 
>>>>>>> >>> components
>>>>>>> >>> [compiler-2:08780] mca: base: components_register: found loaded 
>>>>>>> >>> component tcp
>>>>>>> >>> [compiler-2:08780] mca: base: components_register: component tcp 
>>>>>>> >>> register function successful
>>>>>>> >>> [compiler-2:08780] mca: base: components_open: opening oob 
>>>>>>> >>> components
>>>>>>> >>> [compiler-2:08780] mca: base: components_open: found loaded 
>>>>>>> >>> component tcp
>>>>>>> >>> [compiler-2:08780] mca: base: components_open: component tcp open 
>>>>>>> >>> function successful
>>>>>>> >>> [compiler-2:08780] mca:oob:select: checking available component tcp
>>>>>>> >>> [compiler-2:08780] mca:oob:select: Querying component [tcp]
>>>>>>> >>> [compiler-2:08780] oob:tcp: component_available called
>>>>>>> >>> [compiler-2:08780] WORKING INTERFACE 1 KERNEL INDEX 1 FAMILY: V4
>>>>>>> >>> [compiler-2:08780] WORKING INTERFACE 2 KERNEL INDEX 3 FAMILY: V4
>>>>>>> >>> [compiler-2:08780] [[42202,0],0] oob:tcp:init adding 10.0.251.53 to 
>>>>>>> >>> our list of V4 connections
>>>>>>> >>> [compiler-2:08780] WORKING INTERFACE 3 KERNEL INDEX 4 FAMILY: V4
>>>>>>> >>> [compiler-2:08780] [[42202,0],0] oob:tcp:init adding 10.0.0.4 to 
>>>>>>> >>> our list of V4 connections
>>>>>>> >>> [compiler-2:08780] WORKING INTERFACE 4 KERNEL INDEX 5 FAMILY: V4
>>>>>>> >>> [compiler-2:08780] [[42202,0],0] oob:tcp:init adding 10.2.251.14 to 
>>>>>>> >>> our list of V4 connections
>>>>>>> >>> [compiler-2:08780] WORKING INTERFACE 5 KERNEL INDEX 6 FAMILY: V4
>>>>>>> >>> [compiler-2:08780] [[42202,0],0] oob:tcp:init adding 10.128.0.4 to 
>>>>>>> >>> our list of V4 connections
>>>>>>> >>> [compiler-2:08780] WORKING INTERFACE 6 KERNEL INDEX 7 FAMILY: V4
>>>>>>> >>> [compiler-2:08780] [[42202,0],0] oob:tcp:init adding 93.180.7.38 to 
>>>>>>> >>> our list of V4 connections
>>>>>>> >>> [compiler-2:08780] [[42202,0],0] TCP STARTUP
>>>>>>> >>> [compiler-2:08780] [[42202,0],0] attempting to bind to IPv4 port 0
>>>>>>> >>> [compiler-2:08780] [[42202,0],0] assigned IPv4 port 38420
>>>>>>> >>> [compiler-2:08780] mca:oob:select: Adding component to end
>>>>>>> >>> [compiler-2:08780] mca:oob:select: Found 1 active transports
>>>>>>> >>> [compiler-2:08780] mca: base: components_register: registering rml 
>>>>>>> >>> components
>>>>>>> >>> [compiler-2:08780] mca: base: components_register: found loaded 
>>>>>>> >>> component oob
>>>>>>> >>> [compiler-2:08780] mca: base: components_register: component oob 
>>>>>>> >>> has no register or open function
>>>>>>> >>> [compiler-2:08780] mca: base: components_open: opening rml 
>>>>>>> >>> components
>>>>>>> >>> [compiler-2:08780] mca: base: components_open: found loaded 
>>>>>>> >>> component oob
>>>>>>> >>> [compiler-2:08780] mca: base: components_open: component oob open 
>>>>>>> >>> function successful
>>>>>>> >>> [compiler-2:08780] orte_rml_base_select: initializing rml component 
>>>>>>> >>> oob
>>>>>>> >>> [compiler-2:08780] [[42202,0],0] posting recv
>>>>>>> >>> [compiler-2:08780] [[42202,0],0] posting persistent recv on tag 30 
>>>>>>> >>> for peer [[WILDCARD],WILDCARD]
>>>>>>> >>> [compiler-2:08780] [[42202,0],0] posting recv
>>>>>>> >>> [compiler-2:08780] [[42202,0],0] posting persistent recv on tag 15 
>>>>>>> >>> for peer [[WILDCARD],WILDCARD]
>>>>>>> >>> [compiler-2:08780] [[42202,0],0] posting recv
>>>>>>> >>> [compiler-2:08780] [[42202,0],0] posting persistent recv on tag 32 
>>>>>>> >>> for peer [[WILDCARD],WILDCARD]
>>>>>>> >>> [compiler-2:08780] [[42202,0],0] posting recv
>>>>>>> >>> [compiler-2:08780] [[42202,0],0] posting persistent recv on tag 33 
>>>>>>> >>> for peer [[WILDCARD],WILDCARD]
>>>>>>> >>> [compiler-2:08780] [[42202,0],0] posting recv
>>>>>>> >>> [compiler-2:08780] [[42202,0],0] posting persistent recv on tag 5 
>>>>>>> >>> for peer [[WILDCARD],WILDCARD]
>>>>>>> >>> [compiler-2:08780] [[42202,0],0] posting recv
>>>>>>> >>> [compiler-2:08780] [[42202,0],0] posting persistent recv on tag 10 
>>>>>>> >>> for peer [[WILDCARD],WILDCARD]
>>>>>>> >>> [compiler-2:08780] [[42202,0],0] posting recv
>>>>>>> >>> [compiler-2:08780] [[42202,0],0] posting persistent recv on tag 12 
>>>>>>> >>> for peer [[WILDCARD],WILDCARD]
>>>>>>> >>> [compiler-2:08780] [[42202,0],0] posting recv
>>>>>>> >>> [compiler-2:08780] [[42202,0],0] posting persistent recv on tag 9 
>>>>>>> >>> for peer [[WILDCARD],WILDCARD]
>>>>>>> >>> [compiler-2:08780] [[42202,0],0] posting recv
>>>>>>> >>> [compiler-2:08780] [[42202,0],0] posting persistent recv on tag 34 
>>>>>>> >>> for peer [[WILDCARD],WILDCARD]
>>>>>>> >>> [compiler-2:08780] [[42202,0],0] posting recv
>>>>>>> >>> [compiler-2:08780] [[42202,0],0] posting persistent recv on tag 2 
>>>>>>> >>> for peer [[WILDCARD],WILDCARD]
>>>>>>> >>> [compiler-2:08780] [[42202,0],0] posting recv
>>>>>>> >>> [compiler-2:08780] [[42202,0],0] posting persistent recv on tag 21 
>>>>>>> >>> for peer [[WILDCARD],WILDCARD]
>>>>>>> >>> [compiler-2:08780] [[42202,0],0] posting recv
>>>>>>> >>> [compiler-2:08780] [[42202,0],0] posting persistent recv on tag 22 
>>>>>>> >>> for peer [[WILDCARD],WILDCARD]
>>>>>>> >>> [compiler-2:08780] [[42202,0],0] posting recv
>>>>>>> >>> [compiler-2:08780] [[42202,0],0] posting persistent recv on tag 45 
>>>>>>> >>> for peer [[WILDCARD],WILDCARD]
>>>>>>> >>> [compiler-2:08780] [[42202,0],0] posting recv
>>>>>>> >>> [compiler-2:08780] [[42202,0],0] posting persistent recv on tag 46 
>>>>>>> >>> for peer [[WILDCARD],WILDCARD]
>>>>>>> >>> [compiler-2:08780] [[42202,0],0] posting recv
>>>>>>> >>> [compiler-2:08780] [[42202,0],0] posting persistent recv on tag 1 
>>>>>>> >>> for peer [[WILDCARD],WILDCARD]
>>>>>>> >>> [compiler-2:08780] [[42202,0],0] posting recv
>>>>>>> >>> [compiler-2:08780] [[42202,0],0] posting persistent recv on tag 27 
>>>>>>> >>> for peer [[WILDCARD],WILDCARD]
>>>>>>> >>> Daemon was launched on node1-130-08 - beginning to initialize
>>>>>>> >>> Daemon was launched on node1-130-03 - beginning to initialize
>>>>>>> >>> Daemon was launched on node1-130-05 - beginning to initialize
>>>>>>> >>> Daemon was launched on node1-130-02 - beginning to initialize
>>>>>>> >>> Daemon was launched on node1-130-01 - beginning to initialize
>>>>>>> >>> Daemon was launched on node1-130-04 - beginning to initialize
>>>>>>> >>> Daemon was launched on node1-130-07 - beginning to initialize
>>>>>>> >>> Daemon was launched on node1-130-06 - beginning to initialize
>>>>>>> >>> Daemon [[42202,0],3] checking in as pid 7178 on host node1-130-03
>>>>>>> >>> [node1-130-03:07178] [[42202,0],3] orted: up and running - waiting 
>>>>>>> >>> for commands!
>>>>>>> >>> Daemon [[42202,0],2] checking in as pid 13581 on host node1-130-02
>>>>>>> >>> [node1-130-02:13581] [[42202,0],2] orted: up and running - waiting 
>>>>>>> >>> for commands!
>>>>>>> >>> Daemon [[42202,0],1] checking in as pid 17220 on host node1-130-01
>>>>>>> >>> [node1-130-01:17220] [[42202,0],1] orted: up and running - waiting 
>>>>>>> >>> for commands!
>>>>>>> >>> Daemon [[42202,0],5] checking in as pid 6663 on host node1-130-05
>>>>>>> >>> [node1-130-05:06663] [[42202,0],5] orted: up and running - waiting 
>>>>>>> >>> for commands!
>>>>>>> >>> Daemon [[42202,0],8] checking in as pid 6683 on host node1-130-08
>>>>>>> >>> [node1-130-08:06683] [[42202,0],8] orted: up and running - waiting 
>>>>>>> >>> for commands!
>>>>>>> >>> Daemon [[42202,0],7] checking in as pid 7877 on host node1-130-07
>>>>>>> >>> [node1-130-07:07877] [[42202,0],7] orted: up and running - waiting 
>>>>>>> >>> for commands!
>>>>>>> >>> Daemon [[42202,0],4] checking in as pid 7735 on host node1-130-04
>>>>>>> >>> [node1-130-04:07735] [[42202,0],4] orted: up and running - waiting 
>>>>>>> >>> for commands!
>>>>>>> >>> Daemon [[42202,0],6] checking in as pid 8451 on host node1-130-06
>>>>>>> >>> [node1-130-06:08451] [[42202,0],6] orted: up and running - waiting 
>>>>>>> >>> for commands!
>>>>>>> >>> srun: error: node1-130-03: task 2: Exited with exit code 1
>>>>>>> >>> srun: Terminating job step 657040.1
>>>>>>> >>> srun: error: node1-130-02: task 1: Exited with exit code 1
>>>>>>> >>> slurmd[node1-130-04]: *** STEP 657040.1 KILLED AT 
>>>>>>> >>> 2014-08-12T12:59:07 WITH SIGNAL 9 ***
>>>>>>> >>> slurmd[node1-130-07]: *** STEP 657040.1 KILLED AT 
>>>>>>> >>> 2014-08-12T12:59:07 WITH SIGNAL 9 ***
>>>>>>> >>> slurmd[node1-130-06]: *** STEP 657040.1 KILLED AT 
>>>>>>> >>> 2014-08-12T12:59:07 WITH SIGNAL 9 ***
>>>>>>> >>> srun: Job step aborted: Waiting up to 2 seconds for job step to 
>>>>>>> >>> finish.
>>>>>>> >>> srun: error: node1-130-01: task 0: Exited with exit code 1
>>>>>>> >>> srun: error: node1-130-05: task 4: Exited with exit code 1
>>>>>>> >>> srun: error: node1-130-08: task 7: Exited with exit code 1
>>>>>>> >>> srun: error: node1-130-07: task 6: Exited with exit code 1
>>>>>>> >>> srun: error: node1-130-04: task 3: Killed
>>>>>>> >>> srun: error: node1-130-06: task 5: Killed
>>>>>>> >>> --------------------------------------------------------------------------
>>>>>>> >>> An ORTE daemon has unexpectedly failed after launch and before
>>>>>>> >>> communicating back to mpirun. This could be caused by a number
>>>>>>> >>> of factors, including an inability to create a connection back
>>>>>>> >>> to mpirun due to a lack of common network interfaces and/or no
>>>>>>> >>> route found between them. Please check network connectivity
>>>>>>> >>> (including firewalls and network routing requirements).
>>>>>>> >>> --------------------------------------------------------------------------
>>>>>>> >>> [compiler-2:08780] [[42202,0],0] orted_cmd: received halt_vm cmd
>>>>>>> >>> [compiler-2:08780] mca: base: close: component oob closed
>>>>>>> >>> [compiler-2:08780] mca: base: close: unloading component oob
>>>>>>> >>> [compiler-2:08780] [[42202,0],0] TCP SHUTDOWN
>>>>>>> >>> [compiler-2:08780] mca: base: close: component tcp closed
>>>>>>> >>> [compiler-2:08780] mca: base: close: unloading component tcp
>>>>>>> >>> 
>>>>>>> >>> _______________________________________________
>>>>>>> >>> users mailing list
>>>>>>> >>> us...@open-mpi.org
>>>>>>> >>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>>>> >>> Link to this post: 
>>>>>>> >>> http://www.open-mpi.org/community/lists/users/2014/08/24987.php
>>>>>>> >>> 
>>>>>>> >>> 
>>>>>>> >>> 
>>>>>>> >>> _______________________________________________
>>>>>>> >>> users mailing list
>>>>>>> >>> us...@open-mpi.org
>>>>>>> >>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>>>> >>> Link to this post: 
>>>>>>> >>> http://www.open-mpi.org/community/lists/users/2014/08/24988.php
>>>>>>> >> 
>>>>>>> >> 
>>>>>>> >> -- 
>>>>>>> >> Jeff Squyres
>>>>>>> >> jsquy...@cisco.com
>>>>>>> >> For corporate legal information go to: 
>>>>>>> >> http://www.cisco.com/web/about/doing_business/legal/cri/
>>>>>>> >> 
>>>>>>> >> 
>>>>>>> >> 
>>>>>>> >> 
>>>>>>> >> 
>>>>>>> > 
>>>>>>> > 
>>>>>>> > -- 
>>>>>>> > Jeff Squyres
>>>>>>> > jsquy...@cisco.com
>>>>>>> > For corporate legal information go to: 
>>>>>>> > http://www.cisco.com/web/about/doing_business/legal/cri/
>>>>>>> > 
>>>>>>> > _______________________________________________
>>>>>>> > users mailing list
>>>>>>> > us...@open-mpi.org
>>>>>>> > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>>>> > Link to this post: 
>>>>>>> > http://www.open-mpi.org/community/lists/users/2014/08/25001.php
>>>>>>> 
>>>>>>> 
>>>>>>> -- 
>>>>>>> Jeff Squyres
>>>>>>> jsquy...@cisco.com
>>>>>>> For corporate legal information go to: 
>>>>>>> http://www.cisco.com/web/about/doing_business/legal/cri/
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> _______________________________________________
>>>>>>> users mailing list
>>>>>>> us...@open-mpi.org
>>>>>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>>>> Link to this post: 
>>>>>>> http://www.open-mpi.org/community/lists/users/2014/08/25086.php
>>>>>> 
>>>>>> 
>>>>>> _______________________________________________
>>>>>> users mailing list
>>>>>> us...@open-mpi.org
>>>>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>>> Link to this post: 
>>>>>> http://www.open-mpi.org/community/lists/users/2014/08/25093.php
>>>>>> 
>>>>>> _______________________________________________
>>>>>> users mailing list
>>>>>> us...@open-mpi.org
>>>>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>>> Link to this post: 
>>>>>> http://www.open-mpi.org/community/lists/users/2014/08/25094.php
>>>>> 
>>>>> _______________________________________________
>>>>> users mailing list
>>>>> us...@open-mpi.org
>>>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>> Link to this post: 
>>>>> http://www.open-mpi.org/community/lists/users/2014/08/25095.php
>>>>> 
>>>>> 
>>>>> 
>>>>> _______________________________________________
>>>>> users mailing list
>>>>> us...@open-mpi.org
>>>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>> Link to this post: 
>>>>> http://www.open-mpi.org/community/lists/users/2014/08/25105.php
>>>> 
>>>> 
>>>> 
>>>> 
>>>> 
>>>> _______________________________________________
>>>> users mailing list
>>>> us...@open-mpi.org
>>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>> Link to this post: 
>>>> http://www.open-mpi.org/community/lists/users/2014/08/25127.php
>>>> 
>>>> 
>>>> 
>>>> -- 
>>>> 
>>>> Kind Regards,
>>>> 
>>>> M.
>>>> _______________________________________________
>>>> users mailing list
>>>> us...@open-mpi.org
>>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>> Link to this post: 
>>>> http://www.open-mpi.org/community/lists/users/2014/08/25128.php
>>> 
>>> _______________________________________________
>>> users mailing list
>>> us...@open-mpi.org
>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
>>> Link to this post: 
>>> http://www.open-mpi.org/community/lists/users/2014/08/25129.php
>>> 
>>> 
>>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> _______________________________________________
>> users mailing list
>> us...@open-mpi.org
>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
>> Link to this post: 
>> http://www.open-mpi.org/community/lists/users/2014/08/25154.php
> 
> 
> 
> 

Reply via email to