On Jan 8, 2007, at 2:52 PM, Grobe, Gary L. ((JSC-EV))[ESCG] wrote:

I was wondering if someone could send me the HACKING file so I can do a
bit more with debugging on the snapshots. Our web proxy has webdav
methods turned off (request methods fail) so that I can't get to the
latest of the svn repos.

Bummer. :-( You are definitely falling victim to the fact that or nightly snapshots have been less-than-stable recently. Sorry [again] about that!

FWIW, there's two ways to browse the source in the repository without an SVN checkout:

- you can just point a normal web browser to our SVN repository (I'm pretty sure that doesn't use DAV, but I'm not 100% sure...), e.g.: https://svn.open-mpi.org/svn/ompi/trunk/HACKING

- you can use our Trac SVN browser, e.g.: https://svn.open-mpi.org/ trac/ompi/browser/trunk/HACKING (there's a link at the bottom to download each file without all the HTML markup).

Second thing. From one of your previous emails, I see that MX
is configured with 4 instance by node. Your running with
exactly 4 processes on the first 2 nodes. Weirds things might
happens ...

Just curious about this comment. Are you referring to over subscribing?
We run 4 processes on each node because we have 2 dual core cpu's on
each node. Am I not understanding processor counts correctly?

I'll have to defer to Reese on this one...

PS: Is there any way you can attach to the processes with gdb
? I would like to see the backtrace as showed by gdb in order
to be able to figure out what's wrong there.

When I can get more detailed dbg, I'll send. Though I'm not clear on
what executable is being searched for below.

$ mpirun -dbg=gdb --prefix /usr/local/openmpi-1.2b3r13030 -x
LD_LIBRARY_PATH=${LD_LIBRARY_PATH} --hostfile ./h1-3 -np 5 --mca pml cm
--mca mtl mx ./cpi

FWIW, note that "-dbg" is not a recognized Open MPI mpirun command line switch -- after all the debugging information, Open MPI finally gets to telling you:

---------------------------------------------------------------------- --
Failed to find the following executable:

Host:       juggernaut
Executable: -b

Cannot continue.
---------------------------------------------------------------------- --

So nothing actually ran in this instance.

Our debugging entries on the FAQ (http://www.open-mpi.org/faq/? category=debugging) are fairly inadequate at the moment, but if you're running in an ssh environment, you generally have 2 choices to attach serial debuggers:

1. Put a loop in your app that pauses until you can attach a debugger. Perhaps something like this:

{ int i = 0; printf("pid %d ready\n", getpid()); while (0 == i) sleep (5); }

Kludgey and horrible, but it works.

2. mpirun an xterm with gdb. You'll need to specifically use the -d option to mpirun in order to keep the ssh sessions alive to relay back your X information, or separately setup your X channels yourself (e.g., if you're on a closed network, it may be acceptable to "xhost +" the nodes that you're running on and just manually setup the DISPLAY variable for the target nodes, perhaps via the -x option to mpirun) -- in which case you would not need to use the -d option to mpirun.

Make sense?

--
Jeff Squyres
Server Virtualization Business Unit
Cisco Systems

Reply via email to