I have try if another mpi process is running in the node already the process run
$ricardo$ /opt/openmpi/bin/mpirun --mca plm_rsh_no_tree_spawn 1 -mca plm_base_verbose 10 -host nexus16 ompi_info [nexus10.nlroc:27397] mca: base: components_register: registering plm components [nexus10.nlroc:27397] mca: base: components_register: found loaded component isolated [nexus10.nlroc:27397] mca: base: components_register: component isolated has no register or open function [nexus10.nlroc:27397] mca: base: components_register: found loaded component rsh [nexus10.nlroc:27397] mca: base: components_register: component rsh register function successful [nexus10.nlroc:27397] mca: base: components_register: found loaded component slurm [nexus10.nlroc:27397] mca: base: components_register: component slurm register function successful [nexus10.nlroc:27397] mca: base: components_open: opening plm components [nexus10.nlroc:27397] mca: base: components_open: found loaded component isolated [nexus10.nlroc:27397] mca: base: components_open: component isolated open function successful [nexus10.nlroc:27397] mca: base: components_open: found loaded component rsh [nexus10.nlroc:27397] mca: base: components_open: component rsh open function successful [nexus10.nlroc:27397] mca: base: components_open: found loaded component slurm [nexus10.nlroc:27397] mca: base: components_open: component slurm open function successful [nexus10.nlroc:27397] mca:base:select: Auto-selecting plm components [nexus10.nlroc:27397] mca:base:select:( plm) Querying component [isolated] [nexus10.nlroc:27397] mca:base:select:( plm) Query of component [isolated] set priority to 0 [nexus10.nlroc:27397] mca:base:select:( plm) Querying component [rsh] [nexus10.nlroc:27397] mca:base:select:( plm) Query of component [rsh] set priority to 10 [nexus10.nlroc:27397] mca:base:select:( plm) Querying component [slurm] [nexus10.nlroc:27397] mca:base:select:( plm) Skipping component [slurm]. Query failed to return a module [nexus10.nlroc:27397] mca:base:select:( plm) Selected component [rsh] [nexus10.nlroc:27397] mca: base: close: component isolated closed [nexus10.nlroc:27397] mca: base: close: unloading component isolated [nexus10.nlroc:27397] mca: base: close: component slurm closed [nexus10.nlroc:27397] mca: base: close: unloading component slurm [nexus10.nlroc:27397] [[52326,0],0] plm:base:receive update proc state command from [[52326,0],1] [nexus10.nlroc:27397] [[52326,0],0] plm:base:receive got update_proc_state for job [52326,1] [nexus16.nlroc:59687] mca: base: components_register: registering plm components [nexus16.nlroc:59687] mca: base: components_register: found loaded component isolated [nexus16.nlroc:59687] mca: base: components_register: component isolated has no register or open function [nexus16.nlroc:59687] mca: base: components_register: found loaded component rsh [nexus16.nlroc:59687] mca: base: components_register: component rsh register function successful [nexus16.nlroc:59687] mca: base: components_register: found loaded component slurm [nexus16.nlroc:59687] mca: base: components_register: component slurm register function successful Package: Open MPI XXXX@nexus10.nlroc Distribution Open MPI: 1.8.1 Open MPI repo revision: r31483 Open MPI release date: Apr 22, 2014 Open RTE: 1.8.1 … but if the compute node has not a mpi process running in it it already hangs as /opt/openmpi/bin/mpirun --mca plm_rsh_no_tree_spawn 1 -mca plm_base_verbose 10 -host nexus17 ompi_info [nexus10.nlroc:27438] mca: base: components_register: registering plm components [nexus10.nlroc:27438] mca: base: components_register: found loaded component isolated [nexus10.nlroc:27438] mca: base: components_register: component isolated has no register or open function [nexus10.nlroc:27438] mca: base: components_register: found loaded component rsh [nexus10.nlroc:27438] mca: base: components_register: component rsh register function successful [nexus10.nlroc:27438] mca: base: components_register: found loaded component slurm [nexus10.nlroc:27438] mca: base: components_register: component slurm register function successful [nexus10.nlroc:27438] mca: base: components_open: opening plm components [nexus10.nlroc:27438] mca: base: components_open: found loaded component isolated [nexus10.nlroc:27438] mca: base: components_open: component isolated open function successful [nexus10.nlroc:27438] mca: base: components_open: found loaded component rsh [nexus10.nlroc:27438] mca: base: components_open: component rsh open function successful [nexus10.nlroc:27438] mca: base: components_open: found loaded component slurm [nexus10.nlroc:27438] mca: base: components_open: component slurm open function successful [nexus10.nlroc:27438] mca:base:select: Auto-selecting plm components [nexus10.nlroc:27438] mca:base:select:( plm) Querying component [isolated] [nexus10.nlroc:27438] mca:base:select:( plm) Query of component [isolated] set priority to 0 [nexus10.nlroc:27438] mca:base:select:( plm) Querying component [rsh] [nexus10.nlroc:27438] mca:base:select:( plm) Query of component [rsh] set priority to 10 [nexus10.nlroc:27438] mca:base:select:( plm) Querying component [slurm] [nexus10.nlroc:27438] mca:base:select:( plm) Skipping component [slurm]. Query failed to return a module [nexus10.nlroc:27438] mca:base:select:( plm) Selected component [rsh] [nexus10.nlroc:27438] mca: base: close: component isolated closed [nexus10.nlroc:27438] mca: base: close: unloading component isolated [nexus10.nlroc:27438] mca: base: close: component slurm closed [nexus10.nlroc:27438] mca: base: close: unloading component slurm and it stop there On Mon, Jul 14, 2014 at 8:56 PM, Ralph Castain <r...@open-mpi.org> wrote: > Hmmm...no, it worked just fine for me. It sounds like something else is > going on. > > Try configuring OMPI with --enable-debug, and then add -mca > plm_base_verbose 10 to get a better sense of what is going on. > > > On Jul 14, 2014, at 10:27 AM, Ralph Castain <r...@open-mpi.org> wrote: > > I confess I haven't tested no_tree_spawn in ages, so it is quite possible > it has suffered bit rot. I can try to take a look at it in a bit > > > On Jul 14, 2014, at 10:13 AM, Ricardo Fernández-Perea < > rfernandezpe...@gmail.com> wrote: > > Thank you for the fast answer > > While that resolve my problem with cross ssh authentication a command as > > /opt/openmpi/bin/mpirun --mca mtl mx --mca pml cm --mca > plm_rsh_no_tree_spawn 1 -hostfile hostfile ompi_info > > just hung with no output and although there is a ssh connexion no orte > program is initiated in the destination nodes > > and while > > /opt/openmpi/bin/mpirun -host host18 ompi_info > > works > > /opt/openmpi/bin/mpirun --mca plm_rsh_no_tree_spawn 1 -host host18 > ompi_info > > hangs, is there some condition in the use of this parameter. > > Yours truly > > Ricardo > > > > On Mon, Jul 14, 2014 at 6:35 PM, Ralph Castain <r...@open-mpi.org> wrote: > >> During the 1.7 series and for all follow-on series, OMPI changed to a >> mode where it launches a daemon on all allocated nodes at the startup of >> mpirun. This allows us to determine the hardware topology of the nodes and >> take that into account when mapping. You can override that behavior by >> either adding --novm to your cmd line (which will impact your >> mapping/binding options), or by specifying the hosts to use by editing your >> hostfile, or adding --host host1,host2 to your cmd line >> >> The rsh launcher defaults to a tree-based pattern, thus requiring that we >> be able to ssh from one compute node to another. You can change that to a >> less scalable direct mode by adding >> >> --mca plm_rsh_no_tree_spawn 1 >> >> to the cmd line >> >> >> On Jul 14, 2014, at 9:21 AM, Ricardo Fernández-Perea < >> rfernandezpe...@gmail.com> wrote: >> >> > I'm trying to update to openMPI 1.8.1 thru ssh and Myrinet >> > >> > running a command as >> > >> > /opt/openmpi/bin/mpirun --verbose --mca mtl mx --mca pml cm -hostfile >> hostfile -np 16 >> > >> > when the hostfile contain only two nodes as >> > >> > host1 slots=8 max-slots=8 >> > host2 slots=8 max-slots=8 >> > >> > it runs perfectly but when the hostfile has a third node as >> > >> > >> > host1 slots=8 max-slots=8 >> > host2 slots=8 max-slots=8 >> > host3 slots=8 max-slots=8 >> > >> > it try to establish an ssh connection between the running hosts1 and >> host3 that should not run any process that fails hanging the process >> without signaling. >> > >> > >> > my ompi_info is as follow >> > >> > Package: Open MPI XXX Distribution >> > Open MPI: 1.8.1 >> > Open MPI repo revision: r31483 >> > Open MPI release date: Apr 22, 2014 >> > Open RTE: 1.8.1 >> > Open RTE repo revision: r31483 >> > Open RTE release date: Apr 22, 2014 >> > OPAL: 1.8.1 >> > OPAL repo revision: r31483 >> > OPAL release date: Apr 22, 2014 >> > MPI API: 3.0 >> > Ident string: 1.8.1 >> > Prefix: /opt/openmpi >> > Configured architecture: x86_64-apple-darwin9.8.0 >> > Configure host: XXXX >> > Configured by: XXXX >> > Configured on: Thu Jun 12 10:37:33 CEST 2014 >> > Configure host: XXXX >> > Built by: XXXX >> > Built on: Thu Jun 12 11:13:16 CEST 2014 >> > Built host: XXXX >> > C bindings: yes >> > C++ bindings: yes >> > Fort mpif.h: yes (single underscore) >> > Fort use mpi: yes (full: ignore TKR) >> > Fort use mpi size: deprecated-ompi-info-value >> > Fort use mpi_f08: yes >> > Fort mpi_f08 compliance: The mpi_f08 module is available, but due to >> > limitations in the ifort compiler, does not >> support >> > the following: array subsections, direct >> passthru >> > (where possible) to underlying Open MPI's C >> > functionality >> > Fort mpi_f08 subarrays: no >> > Java bindings: no >> > Wrapper compiler rpath: unnecessary >> > C compiler: icc >> > C compiler absolute: /opt/intel/Compiler/11.1/080/bin/intel64/icc >> > C compiler family name: INTEL >> > C compiler version: 1110.20091130 >> > C++ compiler: icpc >> > C++ compiler absolute: /opt/intel/Compiler/11.1/080/bin/intel64/icpc >> > Fort compiler: ifort >> > Fort compiler abs: /opt/intel/Compiler/11.1/080/bin/intel64/ifort >> > Fort ignore TKR: yes (!DEC$ ATTRIBUTES NO_ARG_CHECK ::) >> > Fort 08 assumed shape: no >> > Fort optional args: yes >> > Fort BIND(C) (all): yes >> > Fort ISO_C_BINDING: yes >> > Fort SUBROUTINE BIND(C): yes >> > Fort TYPE,BIND(C): yes >> > Fort T,BIND(C,name="a"): yes >> > Fort PRIVATE: yes >> > Fort PROTECTED: yes >> > Fort ABSTRACT: yes >> > Fort ASYNCHRONOUS: yes >> > Fort PROCEDURE: yes >> > Fort f08 using wrappers: yes >> > C profiling: yes >> > C++ profiling: yes >> > Fort mpif.h profiling: yes >> > Fort use mpi profiling: yes >> > Fort use mpi_f08 prof: yes >> > C++ exceptions: no >> > Thread support: posix (MPI_THREAD_MULTIPLE: no, OPAL support: >> yes, >> > OMPI progress: no, ORTE progress: yes, Event >> lib: >> > yes) >> > Sparse Groups: no >> > Internal debug support: no >> > MPI interface warnings: yes >> > MPI parameter check: runtime >> > Memory profiling support: no >> > Memory debugging support: no >> > libltdl support: yes >> > Heterogeneous support: no >> > mpirun default --prefix: no >> > MPI I/O support: yes >> > MPI_WTIME support: gettimeofday >> > Symbol vis. support: yes >> > Host topology support: yes >> > MPI extensions: >> > FT Checkpoint support: no (checkpoint thread: no) >> > C/R Enabled Debugging: no >> > VampirTrace support: yes >> > MPI_MAX_PROCESSOR_NAME: 256 >> > MPI_MAX_ERROR_STRING: 256 >> > MPI_MAX_OBJECT_NAME: 64 >> > MPI_MAX_INFO_KEY: 36 >> > MPI_MAX_INFO_VAL: 256 >> > MPI_MAX_PORT_NAME: 1024 >> > MPI_MAX_DATAREP_STRING: 128 >> > >> > >> > _______________________________________________ >> > users mailing list >> > us...@open-mpi.org >> > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users >> > Link to this post: >> http://www.open-mpi.org/community/lists/users/2014/07/24764.php >> >> _______________________________________________ >> users mailing list >> us...@open-mpi.org >> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users >> Link to this post: >> http://www.open-mpi.org/community/lists/users/2014/07/24765.php >> > > _______________________________________________ > users mailing list > us...@open-mpi.org > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users > Link to this post: > http://www.open-mpi.org/community/lists/users/2014/07/24766.php > > > > > _______________________________________________ > users mailing list > us...@open-mpi.org > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users > Link to this post: > http://www.open-mpi.org/community/lists/users/2014/07/24768.php >