> You and Chris G. raise a good point -- another parallel debugger vendor
> has contacted me about the same issue (their debugger does not have an
> executable named "totalview"). 
> <...>
> Comments?
Actually, the point is deeper than just a debugger naming question. 
High-quality MPI implementation should provide more flexibility to the 
user. The debuggers may differ by startup algorithms: MPICH (the same 
example) allows to write arbitrary script for starting the custom 
debugger. And it works fine launching this script in run-time (there is a 
naming convention), w/o need to rebuild and reinstall the library.

> Read the TV specifications about how they attach -- if you have a
> different scheme, let's talk... 
We are not a mature product (as famous TotalView) to have own 
requirements. We are also open to consideration.
Currently I implemented the startup for MPICH program debugging (using the 
totalview interface, I know how it works), but this scheme may be 
changed.

> > Currently I launch gdb's on remote processes via ssh (as MPICH does), but 
> > probably it will be better to use orte framework capabilities for this. 
> Gotcha; not a bad idea.  Might fit nicely into having support for your
> debugger be a component...?
Actually all I need is the same, that orte already does:
1. Launch the processes on all nodes
2. Make sure they are successfully launched.
3. Get the array of handles to read/write to each process
4. Be able to stop the processes
5. Probably send signals to processes (gdb uses SIGINT to interrupt 
execution)
6. Probably have the info about node names and PIDs to display it and to 
implement pp.4-5
Looks just the same as for usual run, but the devil is surely in the 
details.

> I think the two main things you want are:
>  
> 1. the information about the MPI processes in the ORTE job of interest
> (are you interested in handling MPI-2 dynamic situations?).
Not yet. It is planned to support only MPI 1.2 for the first release.

> 2. <..>
I also might want 3. Get the knowlwdge "how it works" to be able to play 
with the code myself :)

> TV's view of the world is to ave "one" master debugger that controls all 
> the processes, so having a separate "starter" process in addition to the 
> MPI processes was no big deal.
I'm trying to do the same way - attach gdb to each process as a node 
debugger and connect all this to the main debugger process, that has GUI 
and implements all "parallel" logic.
The question was merely how to do it: call "gdb orterun" and catch it 
somewhere on breakpoint or attach to orterun later or smth else.

> if you're actually integrated in as a component, then you could get the 
> information directly (i.e., via API)...?  The possibilities here are 
> open. 
This also sounds interesting.

Best regards,
Konstantin.


Reply via email to