On Jan 8, 2007, at 2:52 PM, Grobe, Gary L. ((JSC-EV))[ESCG] wrote:
I was wondering if someone could send me the HACKING file so I can
do a
bit more with debugging on the snapshots. Our web proxy has webdav
methods turned off (request methods fail) so that I can't get to the
latest of the svn repos.
Bummer. :-( You are definitely falling victim to the fact that or
nightly snapshots have been less-than-stable recently. Sorry [again]
about that!
FWIW, there's two ways to browse the source in the repository without
an SVN checkout:
- you can just point a normal web browser to our SVN repository (I'm
pretty sure that doesn't use DAV, but I'm not 100% sure...), e.g.:
https://svn.open-mpi.org/svn/ompi/trunk/HACKING
- you can use our Trac SVN browser, e.g.: https://svn.open-mpi.org/
trac/ompi/browser/trunk/HACKING (there's a link at the bottom to
download each file without all the HTML markup).
Second thing. From one of your previous emails, I see that MX
is configured with 4 instance by node. Your running with
exactly 4 processes on the first 2 nodes. Weirds things might
happens ...
Just curious about this comment. Are you referring to over
subscribing?
We run 4 processes on each node because we have 2 dual core cpu's on
each node. Am I not understanding processor counts correctly?
I'll have to defer to Reese on this one...
PS: Is there any way you can attach to the processes with gdb
? I would like to see the backtrace as showed by gdb in order
to be able to figure out what's wrong there.
When I can get more detailed dbg, I'll send. Though I'm not clear on
what executable is being searched for below.
$ mpirun -dbg=gdb --prefix /usr/local/openmpi-1.2b3r13030 -x
LD_LIBRARY_PATH=${LD_LIBRARY_PATH} --hostfile ./h1-3 -np 5 --mca
pml cm
--mca mtl mx ./cpi
FWIW, note that "-dbg" is not a recognized Open MPI mpirun command
line switch -- after all the debugging information, Open MPI finally
gets to telling you:
----------------------------------------------------------------------
--
Failed to find the following executable:
Host: juggernaut
Executable: -b
Cannot continue.
----------------------------------------------------------------------
--
So nothing actually ran in this instance.
Our debugging entries on the FAQ (http://www.open-mpi.org/faq/?
category=debugging) are fairly inadequate at the moment, but if
you're running in an ssh environment, you generally have 2 choices to
attach serial debuggers:
1. Put a loop in your app that pauses until you can attach a
debugger. Perhaps something like this:
{ int i = 0; printf("pid %d ready\n", getpid()); while (0 == i) sleep
(5); }
Kludgey and horrible, but it works.
2. mpirun an xterm with gdb. You'll need to specifically use the -d
option to mpirun in order to keep the ssh sessions alive to relay
back your X information, or separately setup your X channels yourself
(e.g., if you're on a closed network, it may be acceptable to "xhost
+" the nodes that you're running on and just manually setup the
DISPLAY variable for the target nodes, perhaps via the -x option to
mpirun) -- in which case you would not need to use the -d option to
mpirun.
Make sense?
--
Jeff Squyres
Server Virtualization Business Unit
Cisco Systems