Re: [OMPI users] [EXTERNAL] Issue with mpirun inside a container

2024-09-29 Thread Gilles Gouaillardet via users
Jeffrey,

You are invoking mpirun with the -H  option, so basically mpirun
inside your container will
ssh ... orted ...
but the remote orted will not run in a container, and hence the error
message.
Note it is possible you planned to run everything in the container, but for
some reason Open MPI failed to figure
out the name in the host file is the container, in this case, try without
the -H option, or try using localhost in the host file.

Cheers,

Gilles

On Mon, Sep 30, 2024 at 1:34 AM Jeffrey Layton via users <
users@lists.open-mpi.org> wrote:

> Howard,
>
> I tried the first experiment of using orted instead of mpirun. The output
> is below.
>
>
> /usr/local/mpi/bin/orted: Error: unknown option "-np"
> Type '/usr/local/mpi/bin/orted --help' for usage.
> Usage: /usr/local/mpi/bin/orted [OPTION]...
> -d|--debug   Debug the OpenRTE
>--daemonize   Daemonize the orted into the background
>--debug-daemons   Enable debugging of OpenRTE daemons
>--debug-daemons-file  Enable debugging of OpenRTE daemons, storing
> output
>  in files
> -h|--helpThis help message
>--hnp Direct the orted to act as the HNP
>--hnp-uri   URI for the HNP
>-nodes|--nodes 
>  Regular expression defining nodes in system
>-output-filename|--output-filename 
>  Redirect output from application processes into
>  filename.rank
>--parent-uriURI for the parent if tree launch is enabled.
>-report-bindings|--report-bindings
>  Whether to report process bindings to stderr
>--report-uriReport this process' uri on indicated pipe
> -s|--spinHave the orted spin until we can connect a
> debugger
>  to it
>--set-sid Direct the orted to separate from the current
>  session
>--singleton-died-pipe 
>  Watch on indicated pipe for singleton termination
>--test-suicide 
>  Suicide instead of clean abort after delay
>--tmpdirSet the root for the session directory tree
>-tree-spawn|--tree-spawn
>  Tree-based spawn in progress
>-xterm|--xterm 
>  Create a new xterm window and display output from
>  the specified ranks there
>
> For additional mpirun arguments, run 'mpirun --help '
>
> The following categories exist: general (Defaults to this option), debug,
> output, input, mapping, ranking, binding, devel (arguments useful to
> OMPI
> Developers), compatibility (arguments supported for backwards
> compatibility),
> launch (arguments to modify launch options), and dvm (Distributed
> Virtual
> Machine arguments).
>
>
>
> Then I tried adding the debug flag you mentioned and I got the same error.
> "
>
> bash: line 1: /usr/local/mpi/bin/orted: No such file or directory
> --
> ORTE was unable to reliably start one or more daemons.
>
>
> I also tried a third experiment and tried using a container I have used
> before. It has an older version of Open MPI but I get the same answer as I
> get now,
>
>
> bash: line 1: /usr/local/mpi/bin/orted: No such file or directory
> --
> ORTE was unable to reliably start one or more daemons.
>
>
> This is sounding like a path problem but I'm not sure. Adding the path to
> MPI in $PATH and $LD_LIBRARY_PATH didn't change the error message.
>
> Thanks!
>
> Jeff
>
>
> --
> *From:* users  on behalf of Pritchard
> Jr., Howard via users 
> *Sent:* Friday, September 27, 2024 4:40 PM
> *To:* Open MPI Users 
> *Cc:* Pritchard Jr., Howard (EXTERNAL) 
> *Subject:* Re: [OMPI users] [EXTERNAL] Issue with mpirun inside a
> container
>
> *External email: Use caution opening links or attachments*
>
> Hello Jeff,
>
>
>
> As an experiment why not try
>
>
>
> docker run  /usr/local/mpi/bin/orted
>
>
>
> ?
>
>
>
> and report the results?
>
>
>
> Also, you may want to add –-debug-daemons to the mpirun command line as
> another experiment.
>
>
>
> Howard
>
>
>
> *From: *users  on behalf of Jeffrey
> Layton via users 
> *Reply-To: *Open MPI Users 
> *Date: *Friday, September 27, 2024 at 1:08 PM
> *To: *Open MPI Users 
> *Cc: *Jeffrey Layton 
> *Subject: *[EXTERNAL] [OMPI users] Issue with mpirun inside a container
>
>
>
> Good afternoon,
>
>
>
> I'm getting an error message when I run "mpirun ... " inside a Docker
> container. The message:
>
>
>
>
>
> bash: line 1: /usr/local/mpi/bin/orted: No such file or directory
> --
> ORTE was unable to reliably start one or more daemons.
> This usually is caused by:
>
> * not finding the required libraries and/or 

Re: [OMPI users] [EXTERNAL] Issue with mpirun inside a container

2024-09-29 Thread Jeffrey Layton via users
Howard,

I tried the first experiment of using orted instead of mpirun. The output is 
below.


/usr/local/mpi/bin/orted: Error: unknown option "-np"
Type '/usr/local/mpi/bin/orted --help' for usage.
Usage: /usr/local/mpi/bin/orted [OPTION]...
-d|--debug   Debug the OpenRTE
   --daemonize   Daemonize the orted into the background
   --debug-daemons   Enable debugging of OpenRTE daemons
   --debug-daemons-file  Enable debugging of OpenRTE daemons, storing output
 in files
-h|--helpThis help message
   --hnp Direct the orted to act as the HNP
   --hnp-uri   URI for the HNP
   -nodes|--nodes 
 Regular expression defining nodes in system
   -output-filename|--output-filename 
 Redirect output from application processes into
 filename.rank
   --parent-uriURI for the parent if tree launch is enabled.
   -report-bindings|--report-bindings
 Whether to report process bindings to stderr
   --report-uriReport this process' uri on indicated pipe
-s|--spinHave the orted spin until we can connect a debugger
 to it
   --set-sid Direct the orted to separate from the current
 session
   --singleton-died-pipe 
 Watch on indicated pipe for singleton termination
   --test-suicide 
 Suicide instead of clean abort after delay
   --tmpdirSet the root for the session directory tree
   -tree-spawn|--tree-spawn
 Tree-based spawn in progress
   -xterm|--xterm 
 Create a new xterm window and display output from
 the specified ranks there

For additional mpirun arguments, run 'mpirun --help '

The following categories exist: general (Defaults to this option), debug,
output, input, mapping, ranking, binding, devel (arguments useful to OMPI
Developers), compatibility (arguments supported for backwards 
compatibility),
launch (arguments to modify launch options), and dvm (Distributed Virtual
Machine arguments).



Then I tried adding the debug flag you mentioned and I got the same error. "

bash: line 1: /usr/local/mpi/bin/orted: No such file or directory
--
ORTE was unable to reliably start one or more daemons.


I also tried a third experiment and tried using a container I have used before. 
It has an older version of Open MPI but I get the same answer as I get now,


bash: line 1: /usr/local/mpi/bin/orted: No such file or directory
--
ORTE was unable to reliably start one or more daemons.


This is sounding like a path problem but I'm not sure. Adding the path to MPI 
in $PATH and $LD_LIBRARY_PATH didn't change the error message.

Thanks!

Jeff



From: users  on behalf of Pritchard Jr., 
Howard via users 
Sent: Friday, September 27, 2024 4:40 PM
To: Open MPI Users 
Cc: Pritchard Jr., Howard (EXTERNAL) 
Subject: Re: [OMPI users] [EXTERNAL] Issue with mpirun inside a container

External email: Use caution opening links or attachments


Hello Jeff,



As an experiment why not try



docker run  /usr/local/mpi/bin/orted



?



and report the results?



Also, you may want to add –-debug-daemons to the mpirun command line as another 
experiment.



Howard



From: users  on behalf of Jeffrey Layton via 
users 
Reply-To: Open MPI Users 
Date: Friday, September 27, 2024 at 1:08 PM
To: Open MPI Users 
Cc: Jeffrey Layton 
Subject: [EXTERNAL] [OMPI users] Issue with mpirun inside a container



Good afternoon,



I'm getting an error message when I run "mpirun ... " inside a Docker 
container. The message:





bash: line 1: /usr/local/mpi/bin/orted: No such file or directory
--
ORTE was unable to reliably start one or more daemons.
This usually is caused by:

* not finding the required libraries and/or binaries on
  one or more nodes. Please check your PATH and LD_LIBRARY_PATH
  settings, or configure OMPI with --enable-orterun-prefix-by-default

* lack of authority to execute on one or more specified nodes.
  Please verify your allocation and authorities.

* the inability to write startup files into /tmp (--tmpdir/orte_tmpdir_base).
  Please check with your sys admin to determine the correct location to use.

*  compilation of the orted with dynamic libraries when static are required
  (e.g., on Cray). Please check your configure cmd line and consider using
  one of the contrib/platform definitions for your system type.

* an inability to create a connection back to mpirun due to a
  lack of common network interfaces and/or no route found between
  them. Please check network connectivity (includin