> You and Chris G. raise a good point -- another parallel debugger vendor > has contacted me about the same issue (their debugger does not have an > executable named "totalview"). > <...> > Comments? Actually, the point is deeper than just a debugger naming question. High-quality MPI implementation should provide more flexibility to the user. The debuggers may differ by startup algorithms: MPICH (the same example) allows to write arbitrary script for starting the custom debugger. And it works fine launching this script in run-time (there is a naming convention), w/o need to rebuild and reinstall the library.
> Read the TV specifications about how they attach -- if you have a > different scheme, let's talk... We are not a mature product (as famous TotalView) to have own requirements. We are also open to consideration. Currently I implemented the startup for MPICH program debugging (using the totalview interface, I know how it works), but this scheme may be changed. > > Currently I launch gdb's on remote processes via ssh (as MPICH does), but > > probably it will be better to use orte framework capabilities for this. > Gotcha; not a bad idea. Might fit nicely into having support for your > debugger be a component...? Actually all I need is the same, that orte already does: 1. Launch the processes on all nodes 2. Make sure they are successfully launched. 3. Get the array of handles to read/write to each process 4. Be able to stop the processes 5. Probably send signals to processes (gdb uses SIGINT to interrupt execution) 6. Probably have the info about node names and PIDs to display it and to implement pp.4-5 Looks just the same as for usual run, but the devil is surely in the details. > I think the two main things you want are: > > 1. the information about the MPI processes in the ORTE job of interest > (are you interested in handling MPI-2 dynamic situations?). Not yet. It is planned to support only MPI 1.2 for the first release. > 2. <..> I also might want 3. Get the knowlwdge "how it works" to be able to play with the code myself :) > TV's view of the world is to ave "one" master debugger that controls all > the processes, so having a separate "starter" process in addition to the > MPI processes was no big deal. I'm trying to do the same way - attach gdb to each process as a node debugger and connect all this to the main debugger process, that has GUI and implements all "parallel" logic. The question was merely how to do it: call "gdb orterun" and catch it somewhere on breakpoint or attach to orterun later or smth else. > if you're actually integrated in as a component, then you could get the > information directly (i.e., via API)...? The possibilities here are > open. This also sounds interesting. Best regards, Konstantin.