On Thu, 2005-10-20 at 22:03 +0400, Konstantin Karganov wrote: > > However, we're quite open to other approaches. Because of the nature of > > our integration with a variety of different run-time environments, our > > startup is not a shell script -- mpirun ("orterun" is its real name; > > "mpirun" is a sym link to orterun) is a compiled executable. > Surely, I saw that mpirun is the orterun executable :) > And this means that to add some features I need to rebuild it (and some > run-time libs probably) each time.
Correct. > > What are the requirements of your debugger? Do you attempt to launch > > the MPI processes yourself, or do you attach to them after they are > > launched (which is what TotalView does)? > It is supposed to attach GDB to each process after it has launched, so the > TotalView interface goes well, except that its details are hardcoded in > the source of orte/tools/orterun (as you may guess I don't have the > executable named "totalview", etc.). You and Chris G. raise a good point -- another parallel debugger vendor has contacted me about the same issue (their debugger does not have an executable named "totalview"). In off-list iterations with him, we decided on some kind of format like: mpirun [--debugger <name>] --debug .. The intent here is to make the common case easy for the user, but also allow flexibility in which back-end debugger is invoked. First -- the common case: mpirun --debug -np 4 a.out Will invoke whatever back-end debugger the user has with the proper argv to get mpirun and "-np -4 a.out" passed back to it. --debugger is a synonym for an MCA parameter, so it can be set in a variety of ways (e.g., command line, environment variable, or in a file). The string parameter for --debugger can specify multiple different debuggers (and associated command lines -- with string substitution -- to invoke those debuggers); OMPI's mpirun will search for the first debugger that it can find in the current PATH and invoke it. For example, we'll probably have a default value for --debugger something like: "totalview mpirun -a @mpirun_args@ : fx2 mpirun -a @mpirun_args" and assume that the user invoked mpirun --debug -np 4 a.out This would tell OMPI's mpirun to first search for "totalview" in the current $PATH. If it doesn't find it, then search for "fx2" in the $PATH. If it is found, mpirun will exec "fx2 mpirun -a -np 4 a.out". And, of course, anyone can override that default value (and we're open to adding more -- TV and FX2 are the only ones that I'm aware of at the moment). Also, this only works well for cases where we want to exec a new application to invoke the debugger. Specifically, using "--debug" to start under TV and FX2 is simply syntactic sugar for invoking it yourself, but we've found that users tend to like this. This is the current plan (I haven't gotten around to implementing it yet -- it's probably only 2-3 hours worth of work, but it hasn't been a high priority yet). Comments? > I'd like to know when and where do > the functions from orterun/totalview.{h,c} get called, do I need to write > my own file like this, etc. In other words, "the debugger adder reference > manual" :) Right now, there is no such manual -- we had only added the TV stuff according to what TV (and FX2 and DDT) require. These functions are always invoked inside mpirun -- one is just before we actually launch the processes and the other is right after we have confirmation that they're all blocking inside MPI_INIT waiting for the debugger to attach. Read the TV specifications about how they attach -- if you have a different scheme, let's talk... As you probably know, OMPI is fundamentally based upon a component architecture. We could open this up to making the parallel debugging stuff be a component, and, as such, do something totally different for different debuggers. > Currently I launch gdb's on remote processes via ssh (as MPICH does), but > probably it will be better to use orte framework capabilities for this. > Don't know yet how. Gotcha; not a bad idea. Might fit nicely into having support for your debugger be a component...? When making a new kind of component for OMPI, we always ask ourselves: what, abstractly, does this thing need to do? Assume that we already have controls that tell the MPI processes that they're being debugged (or not). If they are, they'll need to wait upon some kind of notification from the debugger indicating that it has attached before continuing (right now, this is at the very, very end of MPI_INIT; they wait for the value of a variable to change). Additionally, the debugger needs to be able to discover the nodename/PID's of the MPI processes of interest. For basic attaching purposes, I think that these are the main points. Any other ideas? > In general, are there an ompi/orte architecture description docs, other > than short schemes in your publications? It's too general there and too > detailed in sources and doxygen docs. Some intermediate "how all this > works together" doc is needed to assemble the whole picture... > For me, I do not understand it completely. The Open Run-Time Environment (ORTE) layer in OMPI is responsible for all this kind of stuff -- it's all the things that happen before MPI_INIT is ever reached (hence, "orterun"). There's a fairly complicated dance that occurs to spawn a "job" (a collection of individual processes). I think the two main things you want are: 1. the information about the MPI processes in the ORTE job of interest (are you interested in handling MPI-2 dynamic situations?). Right now, this is only available in the totalview.c code in orterun (per the TV specs). But as I mentioned, we could do something else. 2. how to launch your debugger agents out alongside the MPI processes of interest. Since we have little/no documentation about the internals at this point, I'm admittedly waving my hands here, but essentially you'll call orte_rmgr.spawn(), very similar to the invocation in orterun.c. 75% of orterun.c is setting up the arguments to spawn() (not because the arguments are complicated, but rather because we allow quite complex command line argument forms to orterun); the remaining 25% is waiting for the various notifications of completion from ORTE that the job is dead. We might need a little extra logic here to ensure that your job is literally launched alongside the processes of interest, but this is certainly do-able. > > Open MPI uses orterun as its launcher, not the first MPI process. > > Hence, it is the one that TotalView gets it information from (in that > > sense, it's similar to the MPICH model -- there is one coordinator; it's > > just that it's orterun, not the first MPI process). Once orterun > > receives notification that all the MPI processes have started, it gives > > the nodename/PID information of each process to TotalView who then > > launches its own debugger processes on those nodes and attaches to the > > processes. > Hm.. with MPICH I use the first gdb copy to get the info from the 0-th > process and then continue to use it as a node debugger, here I'll have to > use one more gdb to get the process table out of orterun process? And how > to do this in a safe way? In the current implementation, yes, you'll need another gdb (you have to remember where this stuff came from -- TV's view of the world is to ave "one" master debugger that controls all the processes, so having a separate "starter" process in addition to the MPI processes was no big deal). We could do something different, though, such as dump out the information to a file, or if you're actually integrated in as a component, then you could get the information directly (i.e., via API)...? The possibilities here are open. > > You probably get a "stopped" message when you try to bg orterun because > > the shell thinks that it is waiting for input from stdin, because we > > didn't close it. > Actually this shouldn't matter. Many programs don't close stdin but > nothing prevents them from running in background until they try to > read input. The same "Hello world" application runs well with MPICH > "mpirun -np 3 a.out &" > > Best regards, > Konstantin. > > > > _______________________________________________ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users -- {+} Jeff Squyres {+} The Open MPI Project {+} http://www.open-mpi.org/