On Thu, 2005-10-20 at 13:04 +0400, Konstantin Karganov wrote:

> I'm working on an MPI-debugger project and have questions on OpenMPI
> support for debugger tools. Any info or links to docs would be
> appreciated.

At the moment, we're sadly lacking in documentation (particularly
developer documentation).  This list is probably your best bet
(actually, de...@open-mpi.org is probably better).

> At the moment, I'm interested in debug session startup implementation.
> Currently I have a code, that starts MPI program under debugger for
> MPICH, and it seems that OpenMPI is much different.

Correct.  Open MPI's startup is fully modularized -- we can start under
a variety of different launchers (e.g., rsh/ssh, SLURM, PBS, ...etc.).
See the FAQ on the web site for more details
(http://www.open-mpi.org/faq/).

> 1. MPICH program startup is implemented as a set of shell-scripts and all
> I need is to put a debugger-specific startup script, that would be
> called from mpirun. What do I need to add a custom debugger support to
> OpenMPI? Do you plan to support several debuggers and how it is to be
> implemented?

Right now, we only support the TotalView API for attaching debuggers.
However, we're quite open to other approaches.  Because of the nature of
our integration with a variety of different run-time environments, our
startup is not a shell script -- mpirun ("orterun" is its real name;
"mpirun" is a sym link to orterun) is a compiled executable.

What are the requirements of your debugger?  Do you attempt to launch
the MPI processes yourself, or do you attach to them after they are
launched (which is what TotalView does)?

> 2. MPICH (at least for ch_p4 device) launches one process from mpirun
> and other processes are launched later inside MPI_Init. In
> orte/tools/orterun/totalview.cpp it is said that the same model is
> implemented, but in practice all processes start together long
> before MPI_Init. (BTW: what is this - mpirun that is running as a
> background process becomes "stopped" all the time I try to "bg" it?)
> What is the "correct" way and how it is supposed to get a debugger
> attached to all processes of the program?

Open MPI uses orterun as its launcher, not the first MPI process.
Hence, it is the one that TotalView gets it information from (in that
sense, it's similar to the MPICH model -- there is one coordinator; it's
just that it's orterun, not the first MPI process).  Once orterun
receives notification that all the MPI processes have started, it gives
the nodename/PID information of each process to TotalView who then
launches its own debugger processes on those nodes and attaches to the
processes.  

You probably get a "stopped" message when you try to bg orterun because
the shell thinks that it is waiting for input from stdin, because we
didn't close it.

Does that help?

-- 
{+} Jeff Squyres
{+} The Open MPI Project
{+} http://www.open-mpi.org/

Reply via email to