I'm using slurm 2.5.6 $salloc -N8 --exclusive -J ompi -p test $ srun hostname node1-128-21 node1-128-24 node1-128-22 node1-128-26 node1-128-27 node1-128-20 node1-128-25 node1-128-23 $ time mpirun -np 1 --host node1-128-21 ./hello_c Hello, world, I am 0 of 1, (Open MPI v1.9a1, package: Open MPI semenov@compiler-2 Distribution, ident: 1.9a1r32570, repo rev: r32570, Aug 21, 2014 (nightly snapshot tarball), 146) real 1m3.932s user 0m0.035s sys 0m0.072s
Tue, 26 Aug 2014 07:03:58 -0700 от Ralph Castain <r...@open-mpi.org>: >hmmm....what is your allocation like? do you have a large hostfile, for >example? > >if you add a --host argument that contains just the local host, what is the >time for that scenario? > >On Aug 26, 2014, at 6:27 AM, Timur Ismagilov < tismagi...@mail.ru > wrote: >>Hello! >>Here is my time results: >>$time mpirun -n 1 ./hello_c >>Hello, world, I am 0 of 1, (Open MPI v1.9a1, package: Open MPI >>semenov@compiler-2 Distribution, ident: 1.9a1r32570, repo rev: r32570, Aug >>21, 2014 (nightly snapshot tarball), 146) >>real 1m3.985s >>user 0m0.031s >>sys 0m0.083s >> >> >>Fri, 22 Aug 2014 07:43:03 -0700 от Ralph Castain < r...@open-mpi.org >: >>>I'm also puzzled by your timing statement - I can't replicate it: >>> >>>07:41:43 $ time mpirun -n 1 ./hello_c >>>Hello, world, I am 0 of 1, (Open MPI v1.9a1, package: Open MPI rhc@bend001 >>>Distribution, ident: 1.9a1r32577, repo rev: r32577, Unreleased developer >>>copy, 125) >>> >>>real 0m0.547s >>>user 0m0.043s >>>sys 0m0.046s >>> >>>The entire thing ran in 0.5 seconds >>> >>> >>>On Aug 22, 2014, at 6:33 AM, Mike Dubman < mi...@dev.mellanox.co.il > wrote: >>>>Hi, >>>>The default delimiter is ";" . You can change delimiter with >>>>mca_base_env_list_delimiter. >>>> >>>> >>>> >>>>On Fri, Aug 22, 2014 at 2:59 PM, Timur Ismagilov < tismagi...@mail.ru > >>>>wrote: >>>>>Hello! >>>>>If i use latest night snapshot: >>>>>$ ompi_info -V >>>>>Open MPI v1.9a1r32570 >>>>>* In programm hello_c initialization takes ~1 min >>>>>In ompi 1.8.2rc4 and ealier it takes ~1 sec(or less) >>>>>* if i use >>>>>$mpirun --mca mca_base_env_list >>>>>'MXM_SHM_KCOPY_MODE=off,OMP_NUM_THREADS=8' --map-by slot:pe=8 -np 1 >>>>>./hello_c >>>>>i got error >>>>>config_parser.c:657 MXM ERROR Invalid value for SHM_KCOPY_MODE: >>>>>'off,OMP_NUM_THREADS=8'. Expected: [off|knem|cma|autodetect] >>>>>but with -x all works fine (but with warn) >>>>>$mpirun -x MXM_SHM_KCOPY_MODE=off -x OMP_NUM_THREADS=8 -np 1 ./hello_c >>>>>WARNING: The mechanism by which environment variables are explicitly >>>>>.............. >>>>>.............. >>>>>.............. >>>>>Hello, world, I am 0 of 1, (Open MPI v1.9a1, package: Open MPI >>>>>semenov@compiler-2 Distribution, ident: 1.9a1r32570, repo rev: r32570, Aug >>>>>21, 2014 (nightly snapshot tarball), 146) >>>>>Thu, 21 Aug 2014 06:26:13 -0700 от Ralph Castain < r...@open-mpi.org >: >>>>>>Not sure I understand. The problem has been fixed in both the trunk and >>>>>>the 1.8 branch now, so you should be able to work with either of those >>>>>>nightly builds. >>>>>> >>>>>>On Aug 21, 2014, at 12:02 AM, Timur Ismagilov < tismagi...@mail.ru > >>>>>>wrote: >>>>>>>Have i I any opportunity to run mpi jobs? >>>>>>> >>>>>>> >>>>>>>Wed, 20 Aug 2014 10:48:38 -0700 от Ralph Castain < r...@open-mpi.org >: >>>>>>>>yes, i know - it is cmr'd >>>>>>>> >>>>>>>>On Aug 20, 2014, at 10:26 AM, Mike Dubman < mi...@dev.mellanox.co.il > >>>>>>>>wrote: >>>>>>>>>btw, we get same error in v1.8 branch as well. >>>>>>>>> >>>>>>>>> >>>>>>>>>On Wed, Aug 20, 2014 at 8:06 PM, Ralph Castain < r...@open-mpi.org > >>>>>>>>> wrote: >>>>>>>>>>It was not yet fixed - but should be now. >>>>>>>>>> >>>>>>>>>>On Aug 20, 2014, at 6:39 AM, Timur Ismagilov < tismagi...@mail.ru > >>>>>>>>>>wrote: >>>>>>>>>>>Hello! >>>>>>>>>>> >>>>>>>>>>>As i can see, the bug is fixed, but in Open MPI v1.9a1r32516 i >>>>>>>>>>>still have the problem >>>>>>>>>>> >>>>>>>>>>>a) >>>>>>>>>>>$ mpirun -np 1 ./hello_c >>>>>>>>>>>-------------------------------------------------------------------------- >>>>>>>>>>>An ORTE daemon has unexpectedly failed after launch and before >>>>>>>>>>>communicating back to mpirun. This could be caused by a number >>>>>>>>>>>of factors, including an inability to create a connection back >>>>>>>>>>>to mpirun due to a lack of common network interfaces and/or no >>>>>>>>>>>route found between them. Please check network connectivity >>>>>>>>>>>(including firewalls and network routing requirements). >>>>>>>>>>>-------------------------------------------------------------------------- >>>>>>>>>>>b) >>>>>>>>>>>$ mpirun --mca oob_tcp_if_include ib0 -np 1 ./hello_c >>>>>>>>>>>-------------------------------------------------------------------------- >>>>>>>>>>>An ORTE daemon has unexpectedly failed after launch and before >>>>>>>>>>>communicating back to mpirun. This could be caused by a number >>>>>>>>>>>of factors, including an inability to create a connection back >>>>>>>>>>>to mpirun due to a lack of common network interfaces and/or no >>>>>>>>>>>route found between them. Please check network connectivity >>>>>>>>>>>(including firewalls and network routing requirements). >>>>>>>>>>>-------------------------------------------------------------------------- >>>>>>>>>>> >>>>>>>>>>>c) >>>>>>>>>>> >>>>>>>>>>>$ mpirun --mca oob_tcp_if_include ib0 -debug-daemons --mca >>>>>>>>>>>plm_base_verbose 5 -mca oob_base_verbose 10 -mca rml_base_verbose 10 >>>>>>>>>>>-np 1 ./hello_c >>>>>>>>>>>[compiler-2:14673] mca:base:select:( plm) Querying component >>>>>>>>>>>[isolated] >>>>>>>>>>>[compiler-2:14673] mca:base:select:( plm) Query of component >>>>>>>>>>>[isolated] set priority to 0 >>>>>>>>>>>[compiler-2:14673] mca:base:select:( plm) Querying component [rsh] >>>>>>>>>>>[compiler-2:14673] mca:base:select:( plm) Query of component [rsh] >>>>>>>>>>>set priority to 10 >>>>>>>>>>>[compiler-2:14673] mca:base:select:( plm) Querying component [slurm] >>>>>>>>>>>[compiler-2:14673] mca:base:select:( plm) Query of component [slurm] >>>>>>>>>>>set priority to 75 >>>>>>>>>>>[compiler-2:14673] mca:base:select:( plm) Selected component [slurm] >>>>>>>>>>>[compiler-2:14673] mca: base: components_register: registering oob >>>>>>>>>>>components >>>>>>>>>>>[compiler-2:14673] mca: base: components_register: found loaded >>>>>>>>>>>component tcp >>>>>>>>>>>[compiler-2:14673] mca: base: components_register: component tcp >>>>>>>>>>>register function successful >>>>>>>>>>>[compiler-2:14673] mca: base: components_open: opening oob components >>>>>>>>>>>[compiler-2:14673] mca: base: components_open: found loaded >>>>>>>>>>>component tcp >>>>>>>>>>>[compiler-2:14673] mca: base: components_open: component tcp open >>>>>>>>>>>function successful >>>>>>>>>>>[compiler-2:14673] mca:oob:select: checking available component tcp >>>>>>>>>>>[compiler-2:14673] mca:oob:select: Querying component [tcp] >>>>>>>>>>>[compiler-2:14673] oob:tcp: component_available called >>>>>>>>>>>[compiler-2:14673] WORKING INTERFACE 1 KERNEL INDEX 1 FAMILY: V4 >>>>>>>>>>>[compiler-2:14673] WORKING INTERFACE 2 KERNEL INDEX 3 FAMILY: V4 >>>>>>>>>>>[compiler-2:14673] WORKING INTERFACE 3 KERNEL INDEX 4 FAMILY: V4 >>>>>>>>>>>[compiler-2:14673] WORKING INTERFACE 4 KERNEL INDEX 5 FAMILY: V4 >>>>>>>>>>>[compiler-2:14673] WORKING INTERFACE 5 KERNEL INDEX 6 FAMILY: V4 >>>>>>>>>>>[compiler-2:14673] [[49095,0],0] oob:tcp:init adding 10.128.0.4 to >>>>>>>>>>>our list of V4 connections >>>>>>>>>>>[compiler-2:14673] WORKING INTERFACE 6 KERNEL INDEX 7 FAMILY: V4 >>>>>>>>>>>[compiler-2:14673] [[49095,0],0] TCP STARTUP >>>>>>>>>>>[compiler-2:14673] [[49095,0],0] attempting to bind to IPv4 port 0 >>>>>>>>>>>[compiler-2:14673] [[49095,0],0] assigned IPv4 port 59460 >>>>>>>>>>>[compiler-2:14673] mca:oob:select: Adding component to end >>>>>>>>>>>[compiler-2:14673] mca:oob:select: Found 1 active transports >>>>>>>>>>>[compiler-2:14673] mca: base: components_register: registering rml >>>>>>>>>>>components >>>>>>>>>>>[compiler-2:14673] mca: base: components_register: found loaded >>>>>>>>>>>component oob >>>>>>>>>>>[compiler-2:14673] mca: base: components_register: component oob has >>>>>>>>>>>no register or open function >>>>>>>>>>>[compiler-2:14673] mca: base: components_open: opening rml components >>>>>>>>>>>[compiler-2:14673] mca: base: components_open: found loaded >>>>>>>>>>>component oob >>>>>>>>>>>[compiler-2:14673] mca: base: components_open: component oob open >>>>>>>>>>>function successful >>>>>>>>>>>[compiler-2:14673] orte_rml_base_select: initializing rml component >>>>>>>>>>>oob >>>>>>>>>>>[compiler-2:14673] [[49095,0],0] posting recv >>>>>>>>>>>[compiler-2:14673] [[49095,0],0] posting persistent recv on tag 30 >>>>>>>>>>>for peer [[WILDCARD],WILDCARD] >>>>>>>>>>>[compiler-2:14673] [[49095,0],0] posting recv >>>>>>>>>>>[compiler-2:14673] [[49095,0],0] posting persistent recv on tag 15 >>>>>>>>>>>for peer [[WILDCARD],WILDCARD] >>>>>>>>>>>[compiler-2:14673] [[49095,0],0] posting recv >>>>>>>>>>>[compiler-2:14673] [[49095,0],0] posting persistent recv on tag 32 >>>>>>>>>>>for peer [[WILDCARD],WILDCARD] >>>>>>>>>>>[compiler-2:14673] [[49095,0],0] posting recv >>>>>>>>>>>[compiler-2:14673] [[49095,0],0] posting persistent recv on tag 33 >>>>>>>>>>>for peer [[WILDCARD],WILDCARD] >>>>>>>>>>>[compiler-2:14673] [[49095,0],0] posting recv >>>>>>>>>>>[compiler-2:14673] [[49095,0],0] posting persistent recv on tag 5 >>>>>>>>>>>for peer [[WILDCARD],WILDCARD] >>>>>>>>>>>[compiler-2:14673] [[49095,0],0] posting recv >>>>>>>>>>>[compiler-2:14673] [[49095,0],0] posting persistent recv on tag 10 >>>>>>>>>>>for peer [[WILDCARD],WILDCARD] >>>>>>>>>>>[compiler-2:14673] [[49095,0],0] posting recv >>>>>>>>>>>[compiler-2:14673] [[49095,0],0] posting persistent recv on tag 12 >>>>>>>>>>>for peer [[WILDCARD],WILDCARD] >>>>>>>>>>>[compiler-2:14673] [[49095,0],0] posting recv >>>>>>>>>>>[compiler-2:14673] [[49095,0],0] posting persistent recv on tag 9 >>>>>>>>>>>for peer [[WILDCARD],WILDCARD] >>>>>>>>>>>[compiler-2:14673] [[49095,0],0] posting recv >>>>>>>>>>>[compiler-2:14673] [[49095,0],0] posting persistent recv on tag 34 >>>>>>>>>>>for peer [[WILDCARD],WILDCARD] >>>>>>>>>>>[compiler-2:14673] [[49095,0],0] posting recv >>>>>>>>>>>[compiler-2:14673] [[49095,0],0] posting persistent recv on tag 2 >>>>>>>>>>>for peer [[WILDCARD],WILDCARD] >>>>>>>>>>>[compiler-2:14673] [[49095,0],0] posting recv >>>>>>>>>>>[compiler-2:14673] [[49095,0],0] posting persistent recv on tag 21 >>>>>>>>>>>for peer [[WILDCARD],WILDCARD] >>>>>>>>>>>[compiler-2:14673] [[49095,0],0] posting recv >>>>>>>>>>>[compiler-2:14673] [[49095,0],0] posting persistent recv on tag 22 >>>>>>>>>>>for peer [[WILDCARD],WILDCARD] >>>>>>>>>>>[compiler-2:14673] [[49095,0],0] posting recv >>>>>>>>>>>[compiler-2:14673] [[49095,0],0] posting persistent recv on tag 45 >>>>>>>>>>>for peer [[WILDCARD],WILDCARD] >>>>>>>>>>>[compiler-2:14673] [[49095,0],0] posting recv >>>>>>>>>>>[compiler-2:14673] [[49095,0],0] posting persistent recv on tag 46 >>>>>>>>>>>for peer [[WILDCARD],WILDCARD] >>>>>>>>>>>[compiler-2:14673] [[49095,0],0] posting recv >>>>>>>>>>>[compiler-2:14673] [[49095,0],0] posting persistent recv on tag 1 >>>>>>>>>>>for peer [[WILDCARD],WILDCARD] >>>>>>>>>>>[compiler-2:14673] [[49095,0],0] posting recv >>>>>>>>>>>[compiler-2:14673] [[49095,0],0] posting persistent recv on tag 27 >>>>>>>>>>>for peer [[WILDCARD],WILDCARD] >>>>>>>>>>>Daemon was launched on node1-128-01 - beginning to initialize >>>>>>>>>>>-------------------------------------------------------------------------- >>>>>>>>>>>WARNING: An invalid value was given for oob_tcp_if_include. This >>>>>>>>>>>value will be ignored. >>>>>>>>>>>Local host: node1-128-01 >>>>>>>>>>>Value: "ib0" >>>>>>>>>>>Message: Invalid specification (missing "/") >>>>>>>>>>>-------------------------------------------------------------------------- >>>>>>>>>>>-------------------------------------------------------------------------- >>>>>>>>>>>None of the TCP networks specified to be included for out-of-band >>>>>>>>>>>communications >>>>>>>>>>>could be found: >>>>>>>>>>>Value given: >>>>>>>>>>>Please revise the specification and try again. >>>>>>>>>>>-------------------------------------------------------------------------- >>>>>>>>>>>-------------------------------------------------------------------------- >>>>>>>>>>>No network interfaces were found for out-of-band communications. We >>>>>>>>>>>require >>>>>>>>>>>at least one available network for out-of-band messaging. >>>>>>>>>>>-------------------------------------------------------------------------- >>>>>>>>>>>-------------------------------------------------------------------------- >>>>>>>>>>>It looks like orte_init failed for some reason; your parallel >>>>>>>>>>>process is >>>>>>>>>>>likely to abort. There are many reasons that a parallel process can >>>>>>>>>>>fail during orte_init; some of which are due to configuration or >>>>>>>>>>>environment problems. This failure appears to be an internal failure; >>>>>>>>>>>here's some additional information (which may only be relevant to an >>>>>>>>>>>Open MPI developer): >>>>>>>>>>>orte_oob_base_select failed >>>>>>>>>>>--> Returned value (null) (-43) instead of ORTE_SUCCESS >>>>>>>>>>>-------------------------------------------------------------------------- >>>>>>>>>>>srun: error: node1-128-01: task 0: Exited with exit code 213 >>>>>>>>>>>srun: Terminating job step 661215.0 >>>>>>>>>>>-------------------------------------------------------------------------- >>>>>>>>>>>An ORTE daemon has unexpectedly failed after launch and before >>>>>>>>>>>communicating back to mpirun. This could be caused by a number >>>>>>>>>>>of factors, including an inability to create a connection back >>>>>>>>>>>to mpirun due to a lack of common network interfaces and/or no >>>>>>>>>>>route found between them. Please check network connectivity >>>>>>>>>>>(including firewalls and network routing requirements). >>>>>>>>>>>-------------------------------------------------------------------------- >>>>>>>>>>>[compiler-2:14673] [[49095,0],0] orted_cmd: received halt_vm cmd >>>>>>>>>>>[compiler-2:14673] mca: base: close: component oob closed >>>>>>>>>>>[compiler-2:14673] mca: base: close: unloading component oob >>>>>>>>>>>[compiler-2:14673] [[49095,0],0] TCP SHUTDOWN >>>>>>>>>>>[compiler-2:14673] mca: base: close: component tcp closed >>>>>>>>>>>[compiler-2:14673] mca: base: close: unloading component tcp >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>>Tue, 12 Aug 2014 18:33:24 +0000 от "Jeff Squyres (jsquyres)" < >>>>>>>>>>>jsquy...@cisco.com >: >>>>>>>>>>>>I filed the following ticket: >>>>>>>>>>>> >>>>>>>>>>>> https://svn.open-mpi.org/trac/ompi/ticket/4857 >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>>On Aug 12, 2014, at 12:39 PM, Jeff Squyres (jsquyres) < >>>>>>>>>>>>jsquy...@cisco.com > wrote: >>>>>>>>>>>> >>>>>>>>>>>>> (please keep the users list CC'ed) >>>>>>>>>>>>> >>>>>>>>>>>>> We talked about this on the weekly engineering call today. Ralph >>>>>>>>>>>>> has an idea what is happening -- I need to do a little >>>>>>>>>>>>> investigation today and file a bug. I'll make sure you're CC'ed >>>>>>>>>>>>> on the bug ticket. >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> On Aug 12, 2014, at 12:27 PM, Timur Ismagilov < >>>>>>>>>>>>> tismagi...@mail.ru > wrote: >>>>>>>>>>>>> >>>>>>>>>>>>>> I don't have this error in OMPI 1.9a1r32252 and OMPI 1.8.1 (with >>>>>>>>>>>>>> --mca oob_tcp_if_include ib0), but in all latest night snapshots >>>>>>>>>>>>>> i got this error. >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> Tue, 12 Aug 2014 13:08:12 +0000 от "Jeff Squyres (jsquyres)" < >>>>>>>>>>>>>> jsquy...@cisco.com >: >>>>>>>>>>>>>> Are you running any kind of firewall on the node where mpirun is >>>>>>>>>>>>>> invoked? Open MPI needs to be able to use arbitrary TCP ports >>>>>>>>>>>>>> between the servers on which it runs. >>>>>>>>>>>>>> >>>>>>>>>>>>>> This second mail seems to imply a bug in OMPI's >>>>>>>>>>>>>> oob_tcp_if_include param handling, however -- it's supposed to >>>>>>>>>>>>>> be able to handle an interface name (not just a network >>>>>>>>>>>>>> specification). >>>>>>>>>>>>>> >>>>>>>>>>>>>> Ralph -- can you have a look? >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> On Aug 12, 2014, at 8:41 AM, Timur Ismagilov < >>>>>>>>>>>>>> tismagi...@mail.ru > wrote: >>>>>>>>>>>>>> >>>>>>>>>>>>>>> When i add --mca oob_tcp_if_include ib0 (infiniband interface) >>>>>>>>>>>>>>> to mpirun (as it was here: >>>>>>>>>>>>>>> http://www.open-mpi.org/community/lists/users/2014/07/24857.php >>>>>>>>>>>>>>> ) i got this output: >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> [compiler-2:08792] mca:base:select:( plm) Querying component >>>>>>>>>>>>>>> [isolated] >>>>>>>>>>>>>>> [compiler-2:08792] mca:base:select:( plm) Query of component >>>>>>>>>>>>>>> [isolated] set priority to 0 >>>>>>>>>>>>>>> [compiler-2:08792] mca:base:select:( plm) Querying component >>>>>>>>>>>>>>> [rsh] >>>>>>>>>>>>>>> [compiler-2:08792] mca:base:select:( plm) Query of component >>>>>>>>>>>>>>> [rsh] set priority to 10 >>>>>>>>>>>>>>> [compiler-2:08792] mca:base:select:( plm) Querying component >>>>>>>>>>>>>>> [slurm] >>>>>>>>>>>>>>> [compiler-2:08792] mca:base:select:( plm) Query of component >>>>>>>>>>>>>>> [slurm] set priority to 75 >>>>>>>>>>>>>>> [compiler-2:08792] mca:base:select:( plm) Selected component >>>>>>>>>>>>>>> [slurm] >>>>>>>>>>>>>>> [compiler-2:08792] mca: base: components_register: registering >>>>>>>>>>>>>>> oob components >>>>>>>>>>>>>>> [compiler-2:08792] mca: base: components_register: found loaded >>>>>>>>>>>>>>> component tcp >>>>>>>>>>>>>>> [compiler-2:08792] mca: base: components_register: component >>>>>>>>>>>>>>> tcp register function successful >>>>>>>>>>>>>>> [compiler-2:08792] mca: base: components_open: opening oob >>>>>>>>>>>>>>> components >>>>>>>>>>>>>>> [compiler-2:08792] mca: base: components_open: found loaded >>>>>>>>>>>>>>> component tcp >>>>>>>>>>>>>>> [compiler-2:08792] mca: base: components_open: component tcp >>>>>>>>>>>>>>> open function successful >>>>>>>>>>>>>>> [compiler-2:08792] mca:oob:select: checking available component >>>>>>>>>>>>>>> tcp >>>>>>>>>>>>>>> [compiler-2:08792] mca:oob:select: Querying component [tcp] >>>>>>>>>>>>>>> [compiler-2:08792] oob:tcp: component_available called >>>>>>>>>>>>>>> [compiler-2:08792] WORKING INTERFACE 1 KERNEL INDEX 1 FAMILY: V4 >>>>>>>>>>>>>>> [compiler-2:08792] WORKING INTERFACE 2 KERNEL INDEX 3 FAMILY: V4 >>>>>>>>>>>>>>> [compiler-2:08792] WORKING INTERFACE 3 KERNEL INDEX 4 FAMILY: V4 >>>>>>>>>>>>>>> [compiler-2:08792] WORKING INTERFACE 4 KERNEL INDEX 5 FAMILY: V4 >>>>>>>>>>>>>>> [compiler-2:08792] WORKING INTERFACE 5 KERNEL INDEX 6 FAMILY: V4 >>>>>>>>>>>>>>> [compiler-2:08792] [[42190,0],0] oob:tcp:init adding 10.128.0.4 >>>>>>>>>>>>>>> to our list of V4 connections >>>>>>>>>>>>>>> [compiler-2:08792] WORKING INTERFACE 6 KERNEL INDEX 7 FAMILY: V4 >>>>>>>>>>>>>>> [compiler-2:08792] [[42190,0],0] TCP STARTUP >>>>>>>>>>>>>>> [compiler-2:08792] [[42190,0],0] attempting to bind to IPv4 >>>>>>>>>>>>>>> port 0 >>>>>>>>>>>>>>> [compiler-2:08792] [[42190,0],0] assigned IPv4 port 53883 >>>>>>>>>>>>>>> [compiler-2:08792] mca:oob:select: Adding component to end >>>>>>>>>>>>>>> [compiler-2:08792] mca:oob:select: Found 1 active transports >>>>>>>>>>>>>>> [compiler-2:08792] mca: base: components_register: registering >>>>>>>>>>>>>>> rml components >>>>>>>>>>>>>>> [compiler-2:08792] mca: base: components_register: found loaded >>>>>>>>>>>>>>> component oob >>>>>>>>>>>>>>> [compiler-2:08792] mca: base: components_register: component >>>>>>>>>>>>>>> oob has no register or open function >>>>>>>>>>>>>>> [compiler-2:08792] mca: base: components_open: opening rml >>>>>>>>>>>>>>> components >>>>>>>>>>>>>>> [compiler-2:08792] mca: base: components_open: found loaded >>>>>>>>>>>>>>> component oob >>>>>>>>>>>>>>> [compiler-2:08792] mca: base: components_open: component oob >>>>>>>>>>>>>>> open function successful >>>>>>>>>>>>>>> [compiler-2:08792] orte_rml_base_select: initializing rml >>>>>>>>>>>>>>> component oob >>>>>>>>>>>>>>> [compiler-2:08792] [[42190,0],0] posting recv >>>>>>>>>>>>>>> [compiler-2:08792] [[42190,0],0] posting persistent recv on tag >>>>>>>>>>>>>>> 30 for peer [[WILDCARD],WILDCARD] >>>>>>>>>>>>>>> [compiler-2:08792] [[42190,0],0] posting recv >>>>>>>>>>>>>>> [compiler-2:08792] [[42190,0],0] posting persistent recv on tag >>>>>>>>>>>>>>> 15 for peer [[WILDCARD],WILDCARD] >>>>>>>>>>>>>>> [compiler-2:08792] [[42190,0],0] posting recv >>>>>>>>>>>>>>> [compiler-2:08792] [[42190,0],0] posting persistent recv on tag >>>>>>>>>>>>>>> 32 for peer [[WILDCARD],WILDCARD] >>>>>>>>>>>>>>> [compiler-2:08792] [[42190,0],0] posting recv >>>>>>>>>>>>>>> [compiler-2:08792] [[42190,0],0] posting persistent recv on tag >>>>>>>>>>>>>>> 33 for peer [[WILDCARD],WILDCARD] >>>>>>>>>>>>>>> [compiler-2:08792] [[42190,0],0] posting recv >>>>>>>>>>>>>>> [compiler-2:08792] [[42190,0],0] posting persistent recv on tag >>>>>>>>>>>>>>> 5 for peer [[WILDCARD],WILDCARD] >>>>>>>>>>>>>>> [compiler-2:08792] [[42190,0],0] posting recv >>>>>>>>>>>>>>> [compiler-2:08792] [[42190,0],0] posting persistent recv on tag >>>>>>>>>>>>>>> 10 for peer [[WILDCARD],WILDCARD] >>>>>>>>>>>>>>> [compiler-2:08792] [[42190,0],0] posting recv >>>>>>>>>>>>>>> [compiler-2:08792] [[42190,0],0] posting persistent recv on tag >>>>>>>>>>>>>>> 12 for peer [[WILDCARD],WILDCARD] >>>>>>>>>>>>>>> [compiler-2:08792] [[42190,0],0] posting recv >>>>>>>>>>>>>>> [compiler-2:08792] [[42190,0],0] posting persistent recv on tag >>>>>>>>>>>>>>> 9 for peer [[WILDCARD],WILDCARD] >>>>>>>>>>>>>>> [compiler-2:08792] [[42190,0],0] posting recv >>>>>>>>>>>>>>> [compiler-2:08792] [[42190,0],0] posting persistent recv on tag >>>>>>>>>>>>>>> 34 for peer [[WILDCARD],WILDCARD] >>>>>>>>>>>>>>> [compiler-2:08792] [[42190,0],0] posting recv >>>>>>>>>>>>>>> [compiler-2:08792] [[42190,0],0] posting persistent recv on tag >>>>>>>>>>>>>>> 2 for peer [[WILDCARD],WILDCARD] >>>>>>>>>>>>>>> [compiler-2:08792] [[42190,0],0] posting recv >>>>>>>>>>>>>>> [compiler-2:08792] [[42190,0],0] posting persistent recv on tag >>>>>>>>>>>>>>> 21 for peer [[WILDCARD],WILDCARD] >>>>>>>>>>>>>>> [compiler-2:08792] [[42190,0],0] posting recv >>>>>>>>>>>>>>> [compiler-2:08792] [[42190,0],0] posting persistent recv on tag >>>>>>>>>>>>>>> 22 for peer [[WILDCARD],WILDCARD] >>>>>>>>>>>>>>> [compiler-2:08792] [[42190,0],0] posting recv >>>>>>>>>>>>>>> [compiler-2:08792] [[42190,0],0] posting persistent recv on tag >>>>>>>>>>>>>>> 45 for peer [[WILDCARD],WILDCARD] >>>>>>>>>>>>>>> [compiler-2:08792] [[42190,0],0] posting recv >>>>>>>>>>>>>>> [compiler-2:08792] [[42190,0],0] posting persistent recv on tag >>>>>>>>>>>>>>> 46 for peer [[WILDCARD],WILDCARD] >>>>>>>>>>>>>>> [compiler-2:08792] [[42190,0],0] posting recv >>>>>>>>>>>>>>> [compiler-2:08792] [[42190,0],0] posting persistent recv on tag >>>>>>>>>>>>>>> 1 for peer [[WILDCARD],WILDCARD] >>>>>>>>>>>>>>> [compiler-2:08792] [[42190,0],0] posting recv >>>>>>>>>>>>>>> [compiler-2:08792] [[42190,0],0] posting persistent recv on tag >>>>>>>>>>>>>>> 27 for peer [[WILDCARD],WILDCARD] >>>>>>>>>>>>>>> Daemon was launched on node1-128-01 - beginning to initialize >>>>>>>>>>>>>>> Daemon was launched on node1-128-02 - beginning to initialize >>>>>>>>>>>>>>> -------------------------------------------------------------------------- >>>>>>>>>>>>>>> WARNING: An invalid value was given for oob_tcp_if_include. This >>>>>>>>>>>>>>> value will be ignored. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Local host: node1-128-01 >>>>>>>>>>>>>>> Value: "ib0" >>>>>>>>>>>>>>> Message: Invalid specification (missing "/") >>>>>>>>>>>>>>> -------------------------------------------------------------------------- >>>>>>>>>>>>>>> -------------------------------------------------------------------------- >>>>>>>>>>>>>>> WARNING: An invalid value was given for oob_tcp_if_include. This >>>>>>>>>>>>>>> value will be ignored. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Local host: node1-128-02 >>>>>>>>>>>>>>> Value: "ib0" >>>>>>>>>>>>>>> Message: Invalid specification (missing "/") >>>>>>>>>>>>>>> -------------------------------------------------------------------------- >>>>>>>>>>>>>>> -------------------------------------------------------------------------- >>>>>>>>>>>>>>> None of the TCP networks specified to be included for >>>>>>>>>>>>>>> out-of-band communications >>>>>>>>>>>>>>> could be found: >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Value given: >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Please revise the specification and try again. >>>>>>>>>>>>>>> -------------------------------------------------------------------------- >>>>>>>>>>>>>>> -------------------------------------------------------------------------- >>>>>>>>>>>>>>> None of the TCP networks specified to be included for >>>>>>>>>>>>>>> out-of-band communications >>>>>>>>>>>>>>> could be found: >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Value given: >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Please revise the specification and try again. >>>>>>>>>>>>>>> -------------------------------------------------------------------------- >>>>>>>>>>>>>>> -------------------------------------------------------------------------- >>>>>>>>>>>>>>> No network interfaces were found for out-of-band >>>>>>>>>>>>>>> communications. We require >>>>>>>>>>>>>>> at least one available network for out-of-band messaging. >>>>>>>>>>>>>>> -------------------------------------------------------------------------- >>>>>>>>>>>>>>> -------------------------------------------------------------------------- >>>>>>>>>>>>>>> No network interfaces were found for out-of-band >>>>>>>>>>>>>>> communications. We require >>>>>>>>>>>>>>> at least one available network for out-of-band messaging. >>>>>>>>>>>>>>> -------------------------------------------------------------------------- >>>>>>>>>>>>>>> -------------------------------------------------------------------------- >>>>>>>>>>>>>>> It looks like orte_init failed for some reason; your parallel >>>>>>>>>>>>>>> process is >>>>>>>>>>>>>>> likely to abort. There are many reasons that a parallel process >>>>>>>>>>>>>>> can >>>>>>>>>>>>>>> fail during orte_init; some of which are due to configuration or >>>>>>>>>>>>>>> environment problems. This failure appears to be an internal >>>>>>>>>>>>>>> failure; >>>>>>>>>>>>>>> here's some additional information (which may only be relevant >>>>>>>>>>>>>>> to an >>>>>>>>>>>>>>> Open MPI developer): >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> orte_oob_base_select failed >>>>>>>>>>>>>>> --> Returned value (null) (-43) instead of ORTE_SUCCESS >>>>>>>>>>>>>>> -------------------------------------------------------------------------- >>>>>>>>>>>>>>> -------------------------------------------------------------------------- >>>>>>>>>>>>>>> It looks like orte_init failed for some reason; your parallel >>>>>>>>>>>>>>> process is >>>>>>>>>>>>>>> likely to abort. There are many reasons that a parallel process >>>>>>>>>>>>>>> can >>>>>>>>>>>>>>> fail during orte_init; some of which are due to configuration or >>>>>>>>>>>>>>> environment problems. This failure appears to be an internal >>>>>>>>>>>>>>> failure; >>>>>>>>>>>>>>> here's some additional information (which may only be relevant >>>>>>>>>>>>>>> to an >>>>>>>>>>>>>>> Open MPI developer): >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> orte_oob_base_select failed >>>>>>>>>>>>>>> --> Returned value (null) (-43) instead of ORTE_SUCCESS >>>>>>>>>>>>>>> -------------------------------------------------------------------------- >>>>>>>>>>>>>>> srun: error: node1-128-02: task 1: Exited with exit code 213 >>>>>>>>>>>>>>> srun: Terminating job step 657300.0 >>>>>>>>>>>>>>> srun: error: node1-128-01: task 0: Exited with exit code 213 >>>>>>>>>>>>>>> -------------------------------------------------------------------------- >>>>>>>>>>>>>>> An ORTE daemon has unexpectedly failed after launch and before >>>>>>>>>>>>>>> communicating back to mpirun. This could be caused by a number >>>>>>>>>>>>>>> of factors, including an inability to create a connection back >>>>>>>>>>>>>>> to mpirun due to a lack of common network interfaces and/or no >>>>>>>>>>>>>>> route found between them. Please check network connectivity >>>>>>>>>>>>>>> (including firewalls and network routing requirements). >>>>>>>>>>>>>>> -------------------------------------------------------------------------- >>>>>>>>>>>>>>> [compiler-2:08792] [[42190,0],0] orted_cmd: received halt_vm cmd >>>>>>>>>>>>>>> [compiler-2:08792] mca: base: close: component oob closed >>>>>>>>>>>>>>> [compiler-2:08792] mca: base: close: unloading component oob >>>>>>>>>>>>>>> [compiler-2:08792] [[42190,0],0] TCP SHUTDOWN >>>>>>>>>>>>>>> [compiler-2:08792] mca: base: close: component tcp closed >>>>>>>>>>>>>>> [compiler-2:08792] mca: base: close: unloading component tcp >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Tue, 12 Aug 2014 16:14:58 +0400 от Timur Ismagilov < >>>>>>>>>>>>>>> tismagi...@mail.ru >: >>>>>>>>>>>>>>> Hello! >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> I have Open MPI v1.8.2rc4r32485 >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> When i run hello_c, I got this error message >>>>>>>>>>>>>>> $mpirun -np 2 hello_c >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> An ORTE daemon has unexpectedly failed after launch and before >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> communicating back to mpirun. This could be caused by a number >>>>>>>>>>>>>>> of factors, including an inability to create a connection back >>>>>>>>>>>>>>> to mpirun due to a lack of common network interfaces and/or no >>>>>>>>>>>>>>> route found between them. Please check network connectivity >>>>>>>>>>>>>>> (including firewalls and network routing requirements). >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> When i run with --debug-daemons --mca plm_base_verbose 5 -mca >>>>>>>>>>>>>>> oob_base_verbose 10 -mca rml_base_verbose 10 i got this output: >>>>>>>>>>>>>>> $mpirun --debug-daemons --mca plm_base_verbose 5 -mca >>>>>>>>>>>>>>> oob_base_verbose 10 -mca rml_base_verbose 10 -np 2 hello_c >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> [compiler-2:08780] mca:base:select:( plm) Querying component >>>>>>>>>>>>>>> [isolated] >>>>>>>>>>>>>>> [compiler-2:08780] mca:base:select:( plm) Query of component >>>>>>>>>>>>>>> [isolated] set priority to 0 >>>>>>>>>>>>>>> [compiler-2:08780] mca:base:select:( plm) Querying component >>>>>>>>>>>>>>> [rsh] >>>>>>>>>>>>>>> [compiler-2:08780] mca:base:select:( plm) Query of component >>>>>>>>>>>>>>> [rsh] set priority to 10 >>>>>>>>>>>>>>> [compiler-2:08780] mca:base:select:( plm) Querying component >>>>>>>>>>>>>>> [slurm] >>>>>>>>>>>>>>> [compiler-2:08780] mca:base:select:( plm) Query of component >>>>>>>>>>>>>>> [slurm] set priority to 75 >>>>>>>>>>>>>>> [compiler-2:08780] mca:base:select:( plm) Selected component >>>>>>>>>>>>>>> [slurm] >>>>>>>>>>>>>>> [compiler-2:08780] mca: base: components_register: registering >>>>>>>>>>>>>>> oob components >>>>>>>>>>>>>>> [compiler-2:08780] mca: base: components_register: found loaded >>>>>>>>>>>>>>> component tcp >>>>>>>>>>>>>>> [compiler-2:08780] mca: base: components_register: component >>>>>>>>>>>>>>> tcp register function successful >>>>>>>>>>>>>>> [compiler-2:08780] mca: base: components_open: opening oob >>>>>>>>>>>>>>> components >>>>>>>>>>>>>>> [compiler-2:08780] mca: base: components_open: found loaded >>>>>>>>>>>>>>> component tcp >>>>>>>>>>>>>>> [compiler-2:08780] mca: base: components_open: component tcp >>>>>>>>>>>>>>> open function successful >>>>>>>>>>>>>>> [compiler-2:08780] mca:oob:select: checking available component >>>>>>>>>>>>>>> tcp >>>>>>>>>>>>>>> [compiler-2:08780] mca:oob:select: Querying component [tcp] >>>>>>>>>>>>>>> [compiler-2:08780] oob:tcp: component_available called >>>>>>>>>>>>>>> [compiler-2:08780] WORKING INTERFACE 1 KERNEL INDEX 1 FAMILY: V4 >>>>>>>>>>>>>>> [compiler-2:08780] WORKING INTERFACE 2 KERNEL INDEX 3 FAMILY: V4 >>>>>>>>>>>>>>> [compiler-2:08780] [[42202,0],0] oob:tcp:init adding >>>>>>>>>>>>>>> 10.0.251.53 to our list of V4 connections >>>>>>>>>>>>>>> [compiler-2:08780] WORKING INTERFACE 3 KERNEL INDEX 4 FAMILY: V4 >>>>>>>>>>>>>>> [compiler-2:08780] [[42202,0],0] oob:tcp:init adding 10.0.0.4 >>>>>>>>>>>>>>> to our list of V4 connections >>>>>>>>>>>>>>> [compiler-2:08780] WORKING INTERFACE 4 KERNEL INDEX 5 FAMILY: V4 >>>>>>>>>>>>>>> [compiler-2:08780] [[42202,0],0] oob:tcp:init adding >>>>>>>>>>>>>>> 10.2.251.14 to our list of V4 connections >>>>>>>>>>>>>>> [compiler-2:08780] WORKING INTERFACE 5 KERNEL INDEX 6 FAMILY: V4 >>>>>>>>>>>>>>> [compiler-2:08780] [[42202,0],0] oob:tcp:init adding 10.128.0.4 >>>>>>>>>>>>>>> to our list of V4 connections >>>>>>>>>>>>>>> [compiler-2:08780] WORKING INTERFACE 6 KERNEL INDEX 7 FAMILY: V4 >>>>>>>>>>>>>>> [compiler-2:08780] [[42202,0],0] oob:tcp:init adding >>>>>>>>>>>>>>> 93.180.7.38 to our list of V4 connections >>>>>>>>>>>>>>> [compiler-2:08780] [[42202,0],0] TCP STARTUP >>>>>>>>>>>>>>> [compiler-2:08780] [[42202,0],0] attempting to bind to IPv4 >>>>>>>>>>>>>>> port 0 >>>>>>>>>>>>>>> [compiler-2:08780] [[42202,0],0] assigned IPv4 port 38420 >>>>>>>>>>>>>>> [compiler-2:08780] mca:oob:select: Adding component to end >>>>>>>>>>>>>>> [compiler-2:08780] mca:oob:select: Found 1 active transports >>>>>>>>>>>>>>> [compiler-2:08780] mca: base: components_register: registering >>>>>>>>>>>>>>> rml components >>>>>>>>>>>>>>> [compiler-2:08780] mca: base: components_register: found loaded >>>>>>>>>>>>>>> component oob >>>>>>>>>>>>>>> [compiler-2:08780] mca: base: components_register: component >>>>>>>>>>>>>>> oob has no register or open function >>>>>>>>>>>>>>> [compiler-2:08780] mca: base: components_open: opening rml >>>>>>>>>>>>>>> components >>>>>>>>>>>>>>> [compiler-2:08780] mca: base: components_open: found loaded >>>>>>>>>>>>>>> component oob >>>>>>>>>>>>>>> [compiler-2:08780] mca: base: components_open: component oob >>>>>>>>>>>>>>> open function successful >>>>>>>>>>>>>>> [compiler-2:08780] orte_rml_base_select: initializing rml >>>>>>>>>>>>>>> component oob >>>>>>>>>>>>>>> [compiler-2:08780] [[42202,0],0] posting recv >>>>>>>>>>>>>>> [compiler-2:08780] [[42202,0],0] posting persistent recv on tag >>>>>>>>>>>>>>> 30 for peer [[WILDCARD],WILDCARD] >>>>>>>>>>>>>>> [compiler-2:08780] [[42202,0],0] posting recv >>>>>>>>>>>>>>> [compiler-2:08780] [[42202,0],0] posting persistent recv on tag >>>>>>>>>>>>>>> 15 for peer [[WILDCARD],WILDCARD] >>>>>>>>>>>>>>> [compiler-2:08780] [[42202,0],0] posting recv >>>>>>>>>>>>>>> [compiler-2:08780] [[42202,0],0] posting persistent recv on tag >>>>>>>>>>>>>>> 32 for peer [[WILDCARD],WILDCARD] >>>>>>>>>>>>>>> [compiler-2:08780] [[42202,0],0] posting recv >>>>>>>>>>>>>>> [compiler-2:08780] [[42202,0],0] posting persistent recv on tag >>>>>>>>>>>>>>> 33 for peer [[WILDCARD],WILDCARD] >>>>>>>>>>>>>>> [compiler-2:08780] [[42202,0],0] posting recv >>>>>>>>>>>>>>> [compiler-2:08780] [[42202,0],0] posting persistent recv on tag >>>>>>>>>>>>>>> 5 for peer [[WILDCARD],WILDCARD] >>>>>>>>>>>>>>> [compiler-2:08780] [[42202,0],0] posting recv >>>>>>>>>>>>>>> [compiler-2:08780] [[42202,0],0] posting persistent recv on tag >>>>>>>>>>>>>>> 10 for peer [[WILDCARD],WILDCARD] >>>>>>>>>>>>>>> [compiler-2:08780] [[42202,0],0] posting recv >>>>>>>>>>>>>>> [compiler-2:08780] [[42202,0],0] posting persistent recv on tag >>>>>>>>>>>>>>> 12 for peer [[WILDCARD],WILDCARD] >>>>>>>>>>>>>>> [compiler-2:08780] [[42202,0],0] posting recv >>>>>>>>>>>>>>> [compiler-2:08780] [[42202,0],0] posting persistent recv on tag >>>>>>>>>>>>>>> 9 for peer [[WILDCARD],WILDCARD] >>>>>>>>>>>>>>> [compiler-2:08780] [[42202,0],0] posting recv >>>>>>>>>>>>>>> [compiler-2:08780] [[42202,0],0] posting persistent recv on tag >>>>>>>>>>>>>>> 34 for peer [[WILDCARD],WILDCARD] >>>>>>>>>>>>>>> [compiler-2:08780] [[42202,0],0] posting recv >>>>>>>>>>>>>>> [compiler-2:08780] [[42202,0],0] posting persistent recv on tag >>>>>>>>>>>>>>> 2 for peer [[WILDCARD],WILDCARD] >>>>>>>>>>>>>>> [compiler-2:08780] [[42202,0],0] posting recv >>>>>>>>>>>>>>> [compiler-2:08780] [[42202,0],0] posting persistent recv on tag >>>>>>>>>>>>>>> 21 for peer [[WILDCARD],WILDCARD] >>>>>>>>>>>>>>> [compiler-2:08780] [[42202,0],0] posting recv >>>>>>>>>>>>>>> [compiler-2:08780] [[42202,0],0] posting persistent recv on tag >>>>>>>>>>>>>>> 22 for peer [[WILDCARD],WILDCARD] >>>>>>>>>>>>>>> [compiler-2:08780] [[42202,0],0] posting recv >>>>>>>>>>>>>>> [compiler-2:08780] [[42202,0],0] posting persistent recv on tag >>>>>>>>>>>>>>> 45 for peer [[WILDCARD],WILDCARD] >>>>>>>>>>>>>>> [compiler-2:08780] [[42202,0],0] posting recv >>>>>>>>>>>>>>> [compiler-2:08780] [[42202,0],0] posting persistent recv on tag >>>>>>>>>>>>>>> 46 for peer [[WILDCARD],WILDCARD] >>>>>>>>>>>>>>> [compiler-2:08780] [[42202,0],0] posting recv >>>>>>>>>>>>>>> [compiler-2:08780] [[42202,0],0] posting persistent recv on tag >>>>>>>>>>>>>>> 1 for peer [[WILDCARD],WILDCARD] >>>>>>>>>>>>>>> [compiler-2:08780] [[42202,0],0] posting recv >>>>>>>>>>>>>>> [compiler-2:08780] [[42202,0],0] posting persistent recv on tag >>>>>>>>>>>>>>> 27 for peer [[WILDCARD],WILDCARD] >>>>>>>>>>>>>>> Daemon was launched on node1-130-08 - beginning to initialize >>>>>>>>>>>>>>> Daemon was launched on node1-130-03 - beginning to initialize >>>>>>>>>>>>>>> Daemon was launched on node1-130-05 - beginning to initialize >>>>>>>>>>>>>>> Daemon was launched on node1-130-02 - beginning to initialize >>>>>>>>>>>>>>> Daemon was launched on node1-130-01 - beginning to initialize >>>>>>>>>>>>>>> Daemon was launched on node1-130-04 - beginning to initialize >>>>>>>>>>>>>>> Daemon was launched on node1-130-07 - beginning to initialize >>>>>>>>>>>>>>> Daemon was launched on node1-130-06 - beginning to initialize >>>>>>>>>>>>>>> Daemon [[42202,0],3] checking in as pid 7178 on host >>>>>>>>>>>>>>> node1-130-03 >>>>>>>>>>>>>>> [node1-130-03:07178] [[42202,0],3] orted: up and running - >>>>>>>>>>>>>>> waiting for commands! >>>>>>>>>>>>>>> Daemon [[42202,0],2] checking in as pid 13581 on host >>>>>>>>>>>>>>> node1-130-02 >>>>>>>>>>>>>>> [node1-130-02:13581] [[42202,0],2] orted: up and running - >>>>>>>>>>>>>>> waiting for commands! >>>>>>>>>>>>>>> Daemon [[42202,0],1] checking in as pid 17220 on host >>>>>>>>>>>>>>> node1-130-01 >>>>>>>>>>>>>>> [node1-130-01:17220] [[42202,0],1] orted: up and running - >>>>>>>>>>>>>>> waiting for commands! >>>>>>>>>>>>>>> Daemon [[42202,0],5] checking in as pid 6663 on host >>>>>>>>>>>>>>> node1-130-05 >>>>>>>>>>>>>>> [node1-130-05:06663] [[42202,0],5] orted: up and running - >>>>>>>>>>>>>>> waiting for commands! >>>>>>>>>>>>>>> Daemon [[42202,0],8] checking in as pid 6683 on host >>>>>>>>>>>>>>> node1-130-08 >>>>>>>>>>>>>>> [node1-130-08:06683] [[42202,0],8] orted: up and running - >>>>>>>>>>>>>>> waiting for commands! >>>>>>>>>>>>>>> Daemon [[42202,0],7] checking in as pid 7877 on host >>>>>>>>>>>>>>> node1-130-07 >>>>>>>>>>>>>>> [node1-130-07:07877] [[42202,0],7] orted: up and running - >>>>>>>>>>>>>>> waiting for commands! >>>>>>>>>>>>>>> Daemon [[42202,0],4] checking in as pid 7735 on host >>>>>>>>>>>>>>> node1-130-04 >>>>>>>>>>>>>>> [node1-130-04:07735] [[42202,0],4] orted: up and running - >>>>>>>>>>>>>>> waiting for commands! >>>>>>>>>>>>>>> Daemon [[42202,0],6] checking in as pid 8451 on host >>>>>>>>>>>>>>> node1-130-06 >>>>>>>>>>>>>>> [node1-130-06:08451] [[42202,0],6] orted: up and running - >>>>>>>>>>>>>>> waiting for commands! >>>>>>>>>>>>>>> srun: error: node1-130-03: task 2: Exited with exit code 1 >>>>>>>>>>>>>>> srun: Terminating job step 657040.1 >>>>>>>>>>>>>>> srun: error: node1-130-02: task 1: Exited with exit code 1 >>>>>>>>>>>>>>> slurmd[node1-130-04]: *** STEP 657040.1 KILLED AT >>>>>>>>>>>>>>> 2014-08-12T12:59:07 WITH SIGNAL 9 *** >>>>>>>>>>>>>>> slurmd[node1-130-07]: *** STEP 657040.1 KILLED AT >>>>>>>>>>>>>>> 2014-08-12T12:59:07 WITH SIGNAL 9 *** >>>>>>>>>>>>>>> slurmd[node1-130-06]: *** STEP 657040.1 KILLED AT >>>>>>>>>>>>>>> 2014-08-12T12:59:07 WITH SIGNAL 9 *** >>>>>>>>>>>>>>> srun: Job step aborted: Waiting up to 2 seconds for job step to >>>>>>>>>>>>>>> finish. >>>>>>>>>>>>>>> srun: error: node1-130-01: task 0: Exited with exit code 1 >>>>>>>>>>>>>>> srun: error: node1-130-05: task 4: Exited with exit code 1 >>>>>>>>>>>>>>> srun: error: node1-130-08: task 7: Exited with exit code 1 >>>>>>>>>>>>>>> srun: error: node1-130-07: task 6: Exited with exit code 1 >>>>>>>>>>>>>>> srun: error: node1-130-04: task 3: Killed >>>>>>>>>>>>>>> srun: error: node1-130-06: task 5: Killed >>>>>>>>>>>>>>> -------------------------------------------------------------------------- >>>>>>>>>>>>>>> An ORTE daemon has unexpectedly failed after launch and before >>>>>>>>>>>>>>> communicating back to mpirun. This could be caused by a number >>>>>>>>>>>>>>> of factors, including an inability to create a connection back >>>>>>>>>>>>>>> to mpirun due to a lack of common network interfaces and/or no >>>>>>>>>>>>>>> route found between them. Please check network connectivity >>>>>>>>>>>>>>> (including firewalls and network routing requirements). >>>>>>>>>>>>>>> -------------------------------------------------------------------------- >>>>>>>>>>>>>>> [compiler-2:08780] [[42202,0],0] orted_cmd: received halt_vm cmd >>>>>>>>>>>>>>> [compiler-2:08780] mca: base: close: component oob closed >>>>>>>>>>>>>>> [compiler-2:08780] mca: base: close: unloading component oob >>>>>>>>>>>>>>> [compiler-2:08780] [[42202,0],0] TCP SHUTDOWN >>>>>>>>>>>>>>> [compiler-2:08780] mca: base: close: component tcp closed >>>>>>>>>>>>>>> [compiler-2:08780] mca: base: close: unloading component tcp >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> _______________________________________________ >>>>>>>>>>>>>>> users mailing list >>>>>>>>>>>>>>> us...@open-mpi.org >>>>>>>>>>>>>>> Subscription: >>>>>>>>>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users >>>>>>>>>>>>>>> Link to this post: >>>>>>>>>>>>>>> http://www.open-mpi.org/community/lists/users/2014/08/24987.php >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> _______________________________________________ >>>>>>>>>>>>>>> users mailing list >>>>>>>>>>>>>>> us...@open-mpi.org >>>>>>>>>>>>>>> Subscription: >>>>>>>>>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users >>>>>>>>>>>>>>> Link to this post: >>>>>>>>>>>>>>> http://www.open-mpi.org/community/lists/users/2014/08/24988.php >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> -- >>>>>>>>>>>>>> Jeff Squyres >>>>>>>>>>>>>> jsquy...@cisco.com >>>>>>>>>>>>>> For corporate legal information go to: >>>>>>>>>>>>>> http://www.cisco.com/web/about/doing_business/legal/cri/ >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> -- >>>>>>>>>>>>> Jeff Squyres >>>>>>>>>>>>> jsquy...@cisco.com >>>>>>>>>>>>> For corporate legal information go to: >>>>>>>>>>>>> http://www.cisco.com/web/about/doing_business/legal/cri/ >>>>>>>>>>>>> >>>>>>>>>>>>> _______________________________________________ >>>>>>>>>>>>> users mailing list >>>>>>>>>>>>> us...@open-mpi.org >>>>>>>>>>>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users >>>>>>>>>>>>> Link to this post: >>>>>>>>>>>>> http://www.open-mpi.org/community/lists/users/2014/08/25001.php >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>>-- >>>>>>>>>>>>Jeff Squyres >>>>>>>>>>>>jsquy...@cisco.com >>>>>>>>>>>>For corporate legal information go to: >>>>>>>>>>>>http://www.cisco.com/web/about/doing_business/legal/cri/ >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>>_______________________________________________ >>>>>>>>>>>users mailing list >>>>>>>>>>>us...@open-mpi.org >>>>>>>>>>>Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users >>>>>>>>>>>Link to this post: >>>>>>>>>>>http://www.open-mpi.org/community/lists/users/2014/08/25086.php >>>>>>>>>> >>>>>>>>>>_______________________________________________ >>>>>>>>>>users mailing list >>>>>>>>>>us...@open-mpi.org >>>>>>>>>>Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users >>>>>>>>>>Link to this post: >>>>>>>>>>http://www.open-mpi.org/community/lists/users/2014/08/25093.php >>>>>>>>> >>>>>>>>>_______________________________________________ >>>>>>>>>users mailing list >>>>>>>>>us...@open-mpi.org >>>>>>>>>Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users >>>>>>>>>Link to this post: >>>>>>>>>http://www.open-mpi.org/community/lists/users/2014/08/25094.php >>>>>>>>_______________________________________________ >>>>>>>>users mailing list >>>>>>>>us...@open-mpi.org >>>>>>>>Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users >>>>>>>>Link to this post: >>>>>>>>http://www.open-mpi.org/community/lists/users/2014/08/25095.php >>>>>>> >>>>>>> >>>>>>> >>>>>>>_______________________________________________ >>>>>>>users mailing list >>>>>>>us...@open-mpi.org >>>>>>>Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users >>>>>>>Link to this post: >>>>>>>http://www.open-mpi.org/community/lists/users/2014/08/25105.php >>>>> >>>>> >>>>> >>>>> >>>>>_______________________________________________ >>>>>users mailing list >>>>>us...@open-mpi.org >>>>>Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users >>>>>Link to this post: >>>>>http://www.open-mpi.org/community/lists/users/2014/08/25127.php >>>> >>>> >>>> >>>>-- >>>> >>>>Kind Regards, >>>> >>>>M. _______________________________________________ >>>>users mailing list >>>>us...@open-mpi.org >>>>Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users >>>>Link to this post: >>>>http://www.open-mpi.org/community/lists/users/2014/08/25128.php >>>_______________________________________________ >>>users mailing list >>>us...@open-mpi.org >>>Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users >>>Link to this post: >>>http://www.open-mpi.org/community/lists/users/2014/08/25129.php >> >> >> ----------------------------------------------------------------------