> However, we're quite open to other approaches.  Because of the nature of
> our integration with a variety of different run-time environments, our
> startup is not a shell script -- mpirun ("orterun" is its real name;
> "mpirun" is a sym link to orterun) is a compiled executable.
Surely, I saw that mpirun is the orterun executable :)
And this means that to add some features I need to rebuild it (and some 
run-time libs probably) each time. 

> What are the requirements of your debugger?  Do you attempt to launch
> the MPI processes yourself, or do you attach to them after they are
> launched (which is what TotalView does)?
It is supposed to attach GDB to each process after it has launched, so the 
TotalView interface goes well, except that its details are hardcoded in 
the source of orte/tools/orterun (as you may guess I don't have the 
executable named "totalview", etc.). I'd like to know when and where do 
the functions from orterun/totalview.{h,c} get called, do I need to write 
my own file like this, etc. In other words, "the debugger adder reference 
manual" :)

Currently I launch gdb's on remote processes via ssh (as MPICH does), but 
probably it will be better to use orte framework capabilities for this. 
Don't know yet how.

In general, are there an ompi/orte architecture description docs, other 
than short schemes in your publications? It's too general there and too 
detailed in sources and doxygen docs. Some intermediate "how all this 
works together" doc is needed to assemble the whole picture...
For me, I do not understand it completely.

> Open MPI uses orterun as its launcher, not the first MPI process.
> Hence, it is the one that TotalView gets it information from (in that
> sense, it's similar to the MPICH model -- there is one coordinator; it's
> just that it's orterun, not the first MPI process).  Once orterun
> receives notification that all the MPI processes have started, it gives
> the nodename/PID information of each process to TotalView who then
> launches its own debugger processes on those nodes and attaches to the
> processes.  
Hm.. with MPICH I use the first gdb copy to get the info from the 0-th 
process and then continue to use it as a node debugger, here I'll have to 
use one more gdb to get the process table out of orterun process? And how 
to do this in a safe way?

> You probably get a "stopped" message when you try to bg orterun because
> the shell thinks that it is waiting for input from stdin, because we
> didn't close it.
Actually this shouldn't matter. Many programs don't close stdin but 
nothing prevents them from running in background until they try to 
read input. The same "Hello world" application runs well with MPICH 
"mpirun -np 3 a.out &"

Best regards,
Konstantin.



Reply via email to