On Thu, 2005-10-20 at 13:04 +0400, Konstantin Karganov wrote: > I'm working on an MPI-debugger project and have questions on OpenMPI > support for debugger tools. Any info or links to docs would be > appreciated.
At the moment, we're sadly lacking in documentation (particularly developer documentation). This list is probably your best bet (actually, de...@open-mpi.org is probably better). > At the moment, I'm interested in debug session startup implementation. > Currently I have a code, that starts MPI program under debugger for > MPICH, and it seems that OpenMPI is much different. Correct. Open MPI's startup is fully modularized -- we can start under a variety of different launchers (e.g., rsh/ssh, SLURM, PBS, ...etc.). See the FAQ on the web site for more details (http://www.open-mpi.org/faq/). > 1. MPICH program startup is implemented as a set of shell-scripts and all > I need is to put a debugger-specific startup script, that would be > called from mpirun. What do I need to add a custom debugger support to > OpenMPI? Do you plan to support several debuggers and how it is to be > implemented? Right now, we only support the TotalView API for attaching debuggers. However, we're quite open to other approaches. Because of the nature of our integration with a variety of different run-time environments, our startup is not a shell script -- mpirun ("orterun" is its real name; "mpirun" is a sym link to orterun) is a compiled executable. What are the requirements of your debugger? Do you attempt to launch the MPI processes yourself, or do you attach to them after they are launched (which is what TotalView does)? > 2. MPICH (at least for ch_p4 device) launches one process from mpirun > and other processes are launched later inside MPI_Init. In > orte/tools/orterun/totalview.cpp it is said that the same model is > implemented, but in practice all processes start together long > before MPI_Init. (BTW: what is this - mpirun that is running as a > background process becomes "stopped" all the time I try to "bg" it?) > What is the "correct" way and how it is supposed to get a debugger > attached to all processes of the program? Open MPI uses orterun as its launcher, not the first MPI process. Hence, it is the one that TotalView gets it information from (in that sense, it's similar to the MPICH model -- there is one coordinator; it's just that it's orterun, not the first MPI process). Once orterun receives notification that all the MPI processes have started, it gives the nodename/PID information of each process to TotalView who then launches its own debugger processes on those nodes and attaches to the processes. You probably get a "stopped" message when you try to bg orterun because the shell thinks that it is waiting for input from stdin, because we didn't close it. Does that help? -- {+} Jeff Squyres {+} The Open MPI Project {+} http://www.open-mpi.org/