I have try if another mpi process is running in the node already the
process run

$ricardo$ /opt/openmpi/bin/mpirun  --mca plm_rsh_no_tree_spawn 1 -mca
plm_base_verbose 10 -host nexus16 ompi_info
[nexus10.nlroc:27397] mca: base: components_register: registering plm
components
[nexus10.nlroc:27397] mca: base: components_register: found loaded
component isolated
[nexus10.nlroc:27397] mca: base: components_register: component isolated
has no register or open function
[nexus10.nlroc:27397] mca: base: components_register: found loaded
component rsh
[nexus10.nlroc:27397] mca: base: components_register: component rsh
register function successful
[nexus10.nlroc:27397] mca: base: components_register: found loaded
component slurm
[nexus10.nlroc:27397] mca: base: components_register: component slurm
register function successful
[nexus10.nlroc:27397] mca: base: components_open: opening plm components
[nexus10.nlroc:27397] mca: base: components_open: found loaded component
isolated
[nexus10.nlroc:27397] mca: base: components_open: component isolated open
function successful
[nexus10.nlroc:27397] mca: base: components_open: found loaded component rsh
[nexus10.nlroc:27397] mca: base: components_open: component rsh open
function successful
[nexus10.nlroc:27397] mca: base: components_open: found loaded component
slurm
[nexus10.nlroc:27397] mca: base: components_open: component slurm open
function successful
[nexus10.nlroc:27397] mca:base:select: Auto-selecting plm components
[nexus10.nlroc:27397] mca:base:select:(  plm) Querying component [isolated]
[nexus10.nlroc:27397] mca:base:select:(  plm) Query of component [isolated]
set priority to 0
[nexus10.nlroc:27397] mca:base:select:(  plm) Querying component [rsh]
[nexus10.nlroc:27397] mca:base:select:(  plm) Query of component [rsh] set
priority to 10
[nexus10.nlroc:27397] mca:base:select:(  plm) Querying component [slurm]
[nexus10.nlroc:27397] mca:base:select:(  plm) Skipping component [slurm].
Query failed to return a module
[nexus10.nlroc:27397] mca:base:select:(  plm) Selected component [rsh]
[nexus10.nlroc:27397] mca: base: close: component isolated closed
[nexus10.nlroc:27397] mca: base: close: unloading component isolated
[nexus10.nlroc:27397] mca: base: close: component slurm closed
[nexus10.nlroc:27397] mca: base: close: unloading component slurm
[nexus10.nlroc:27397] [[52326,0],0] plm:base:receive update proc state
command from [[52326,0],1]
[nexus10.nlroc:27397] [[52326,0],0] plm:base:receive got update_proc_state
for job [52326,1]
[nexus16.nlroc:59687] mca: base: components_register: registering plm
components
[nexus16.nlroc:59687] mca: base: components_register: found loaded
component isolated
[nexus16.nlroc:59687] mca: base: components_register: component isolated
has no register or open function
[nexus16.nlroc:59687] mca: base: components_register: found loaded
component rsh
[nexus16.nlroc:59687] mca: base: components_register: component rsh
register function successful
[nexus16.nlroc:59687] mca: base: components_register: found loaded
component slurm
[nexus16.nlroc:59687] mca: base: components_register: component slurm
register function successful
                 Package: Open MPI XXXX@nexus10.nlroc Distribution
                Open MPI: 1.8.1
  Open MPI repo revision: r31483
   Open MPI release date: Apr 22, 2014
                Open RTE: 1.8.1
…

but if the compute node has not a mpi process running in it it already
hangs as

/opt/openmpi/bin/mpirun  --mca plm_rsh_no_tree_spawn 1 -mca
plm_base_verbose 10 -host nexus17 ompi_info
[nexus10.nlroc:27438] mca: base: components_register: registering plm
components
[nexus10.nlroc:27438] mca: base: components_register: found loaded
component isolated
[nexus10.nlroc:27438] mca: base: components_register: component isolated
has no register or open function
[nexus10.nlroc:27438] mca: base: components_register: found loaded
component rsh
[nexus10.nlroc:27438] mca: base: components_register: component rsh
register function successful
[nexus10.nlroc:27438] mca: base: components_register: found loaded
component slurm
[nexus10.nlroc:27438] mca: base: components_register: component slurm
register function successful
[nexus10.nlroc:27438] mca: base: components_open: opening plm components
[nexus10.nlroc:27438] mca: base: components_open: found loaded component
isolated
[nexus10.nlroc:27438] mca: base: components_open: component isolated open
function successful
[nexus10.nlroc:27438] mca: base: components_open: found loaded component rsh
[nexus10.nlroc:27438] mca: base: components_open: component rsh open
function successful
[nexus10.nlroc:27438] mca: base: components_open: found loaded component
slurm
[nexus10.nlroc:27438] mca: base: components_open: component slurm open
function successful
[nexus10.nlroc:27438] mca:base:select: Auto-selecting plm components
[nexus10.nlroc:27438] mca:base:select:(  plm) Querying component [isolated]
[nexus10.nlroc:27438] mca:base:select:(  plm) Query of component [isolated]
set priority to 0
[nexus10.nlroc:27438] mca:base:select:(  plm) Querying component [rsh]
[nexus10.nlroc:27438] mca:base:select:(  plm) Query of component [rsh] set
priority to 10
[nexus10.nlroc:27438] mca:base:select:(  plm) Querying component [slurm]
[nexus10.nlroc:27438] mca:base:select:(  plm) Skipping component [slurm].
Query failed to return a module
[nexus10.nlroc:27438] mca:base:select:(  plm) Selected component [rsh]
[nexus10.nlroc:27438] mca: base: close: component isolated closed
[nexus10.nlroc:27438] mca: base: close: unloading component isolated
[nexus10.nlroc:27438] mca: base: close: component slurm closed
[nexus10.nlroc:27438] mca: base: close: unloading component slurm

and  it stop there




On Mon, Jul 14, 2014 at 8:56 PM, Ralph Castain <r...@open-mpi.org> wrote:

> Hmmm...no, it worked just fine for me. It sounds like something else is
> going on.
>
> Try configuring OMPI with --enable-debug, and then add -mca
> plm_base_verbose 10 to get a better sense of what is going on.
>
>
> On Jul 14, 2014, at 10:27 AM, Ralph Castain <r...@open-mpi.org> wrote:
>
> I confess I haven't tested no_tree_spawn in ages, so it is quite possible
> it has suffered bit rot. I can try to take a look at it in a bit
>
>
> On Jul 14, 2014, at 10:13 AM, Ricardo Fernández-Perea <
> rfernandezpe...@gmail.com> wrote:
>
> Thank you for the fast answer
>
> While that resolve my problem with cross ssh authentication   a command as
>
> /opt/openmpi/bin/mpirun  --mca mtl mx --mca pml cm --mca
> plm_rsh_no_tree_spawn 1 -hostfile hostfile ompi_info
>
> just hung with no output and although there is a ssh connexion no orte
> program is initiated in the destination nodes
>
> and while
>
> /opt/openmpi/bin/mpirun  -host host18 ompi_info
>
> works
>
> /opt/openmpi/bin/mpirun  --mca plm_rsh_no_tree_spawn 1 -host host18
> ompi_info
>
> hangs, is there some condition in the use of this parameter.
>
> Yours truly
>
> Ricardo
>
>
>
> On Mon, Jul 14, 2014 at 6:35 PM, Ralph Castain <r...@open-mpi.org> wrote:
>
>> During the 1.7 series and for all follow-on series, OMPI changed to a
>> mode where it launches a daemon on all allocated nodes at the startup of
>> mpirun. This allows us to determine the hardware topology of the nodes and
>> take that into account when mapping. You can override that behavior by
>> either adding --novm to your cmd line (which will impact your
>> mapping/binding options), or by specifying the hosts to use by editing your
>> hostfile, or adding --host host1,host2 to your cmd line
>>
>> The rsh launcher defaults to a tree-based pattern, thus requiring that we
>> be able to ssh from one compute node to another. You can change that to a
>> less scalable direct mode by adding
>>
>> --mca plm_rsh_no_tree_spawn 1
>>
>> to the cmd line
>>
>>
>> On Jul 14, 2014, at 9:21 AM, Ricardo Fernández-Perea <
>> rfernandezpe...@gmail.com> wrote:
>>
>> > I'm trying to update to openMPI 1.8.1 thru ssh  and Myrinet
>> >
>> > running a command as
>> >
>> > /opt/openmpi/bin/mpirun --verbose --mca mtl mx --mca pml cm  -hostfile
>> hostfile -np 16
>> >
>> > when the hostfile contain only two nodes as
>> >
>> > host1 slots=8 max-slots=8
>> > host2 slots=8 max-slots=8
>> >
>> > it runs perfectly but when the hostfile has a third node as
>> >
>> >
>> > host1 slots=8 max-slots=8
>> > host2 slots=8 max-slots=8
>> > host3 slots=8 max-slots=8
>> >
>> > it try to establish an ssh connection between  the running hosts1 and
>> host3 that should not run any process  that fails hanging the process
>> without signaling.
>> >
>> >
>> > my ompi_info is as follow
>> >
>> >                 Package: Open MPI XXX Distribution
>> >                 Open MPI: 1.8.1
>> >   Open MPI repo revision: r31483
>> >    Open MPI release date: Apr 22, 2014
>> >                 Open RTE: 1.8.1
>> >   Open RTE repo revision: r31483
>> >    Open RTE release date: Apr 22, 2014
>> >                     OPAL: 1.8.1
>> >       OPAL repo revision: r31483
>> >        OPAL release date: Apr 22, 2014
>> >                  MPI API: 3.0
>> >             Ident string: 1.8.1
>> >                   Prefix: /opt/openmpi
>> >  Configured architecture: x86_64-apple-darwin9.8.0
>> >           Configure host: XXXX
>> >            Configured by: XXXX
>> >            Configured on: Thu Jun 12 10:37:33 CEST 2014
>> >           Configure host: XXXX
>> >                 Built by: XXXX
>> >                 Built on: Thu Jun 12 11:13:16 CEST 2014
>> >               Built host: XXXX
>> >               C bindings: yes
>> >             C++ bindings: yes
>> >              Fort mpif.h: yes (single underscore)
>> >             Fort use mpi: yes (full: ignore TKR)
>> >        Fort use mpi size: deprecated-ompi-info-value
>> >         Fort use mpi_f08: yes
>> >  Fort mpi_f08 compliance: The mpi_f08 module is available, but due to
>> >                           limitations in the ifort compiler, does not
>> support
>> >                           the following: array subsections, direct
>> passthru
>> >                           (where possible) to underlying Open MPI's C
>> >                           functionality
>> >   Fort mpi_f08 subarrays: no
>> >            Java bindings: no
>> >   Wrapper compiler rpath: unnecessary
>> >               C compiler: icc
>> >      C compiler absolute: /opt/intel/Compiler/11.1/080/bin/intel64/icc
>> >   C compiler family name: INTEL
>> >       C compiler version: 1110.20091130
>> >             C++ compiler: icpc
>> >    C++ compiler absolute: /opt/intel/Compiler/11.1/080/bin/intel64/icpc
>> >            Fort compiler: ifort
>> >        Fort compiler abs: /opt/intel/Compiler/11.1/080/bin/intel64/ifort
>> >          Fort ignore TKR: yes (!DEC$ ATTRIBUTES NO_ARG_CHECK ::)
>> >    Fort 08 assumed shape: no
>> >       Fort optional args: yes
>> >       Fort BIND(C) (all): yes
>> >       Fort ISO_C_BINDING: yes
>> >  Fort SUBROUTINE BIND(C): yes
>> >        Fort TYPE,BIND(C): yes
>> >  Fort T,BIND(C,name="a"): yes
>> >             Fort PRIVATE: yes
>> >           Fort PROTECTED: yes
>> >            Fort ABSTRACT: yes
>> >        Fort ASYNCHRONOUS: yes
>> >           Fort PROCEDURE: yes
>> >  Fort f08 using wrappers: yes
>> >              C profiling: yes
>> >            C++ profiling: yes
>> >    Fort mpif.h profiling: yes
>> >   Fort use mpi profiling: yes
>> >    Fort use mpi_f08 prof: yes
>> >           C++ exceptions: no
>> >           Thread support: posix (MPI_THREAD_MULTIPLE: no, OPAL support:
>> yes,
>> >                           OMPI progress: no, ORTE progress: yes, Event
>> lib:
>> >                           yes)
>> >            Sparse Groups: no
>> >   Internal debug support: no
>> >   MPI interface warnings: yes
>> >      MPI parameter check: runtime
>> > Memory profiling support: no
>> > Memory debugging support: no
>> >          libltdl support: yes
>> >    Heterogeneous support: no
>> >  mpirun default --prefix: no
>> >          MPI I/O support: yes
>> >        MPI_WTIME support: gettimeofday
>> >      Symbol vis. support: yes
>> >    Host topology support: yes
>> >           MPI extensions:
>> >    FT Checkpoint support: no (checkpoint thread: no)
>> >    C/R Enabled Debugging: no
>> >      VampirTrace support: yes
>> >   MPI_MAX_PROCESSOR_NAME: 256
>> >     MPI_MAX_ERROR_STRING: 256
>> >      MPI_MAX_OBJECT_NAME: 64
>> >         MPI_MAX_INFO_KEY: 36
>> >         MPI_MAX_INFO_VAL: 256
>> >        MPI_MAX_PORT_NAME: 1024
>> >   MPI_MAX_DATAREP_STRING: 128
>> >
>> >
>> > _______________________________________________
>> > users mailing list
>> > us...@open-mpi.org
>> > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
>> > Link to this post:
>> http://www.open-mpi.org/community/lists/users/2014/07/24764.php
>>
>> _______________________________________________
>> users mailing list
>> us...@open-mpi.org
>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
>> Link to this post:
>> http://www.open-mpi.org/community/lists/users/2014/07/24765.php
>>
>
> _______________________________________________
> users mailing list
> us...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
> Link to this post:
> http://www.open-mpi.org/community/lists/users/2014/07/24766.php
>
>
>
>
> _______________________________________________
> users mailing list
> us...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
> Link to this post:
> http://www.open-mpi.org/community/lists/users/2014/07/24768.php
>

Reply via email to