ome reason, OMPI appears to
have decided that it had not yet received the message. Perhaps a
memory bug in your application...? Have you run it through valgrind,
or some other memory-checking debugger, perchance?
On Oct 18, 2007, at 12:35 PM, Daniel Rozenbaum wrote:
Unfortunately, so far
omplete ==
false" and calls opal_condition_wait().
Jeff Squyres wrote:
Can you send a short test program that shows this problem, perchance?
On Oct 3, 2007, at 1:41 PM, Daniel Rozenbaum wrote:
Hi again,
I'm trying to debug the problem I posted on several times recently;
I thought I
Hi again,
I'm trying to debug the problem I posted
on several times recently; I thought I'd try asking a more focused
question:
I have the following sequence in the client code:
MPI_Status stat;
ret = MPI_Probe(0, MPI_ANY_TAG, MPI_COMM_WORLD, &stat);
assert(ret == MPI_SUCCESS);
ret = M
d()'s and
three unprocessed Irecv()'s.
I've upgraded to Open MPI 1.2.4, but this made no difference.
Are there any internal logging or debugging facilities in Open MPI that
would allow me to further track the calls that eventually result in the
error in mca_btl_tcp_frag_recv() ?
Tha
at the beginning of the run and
are processed correctly though.
Also, I ran the same experiment on another cluster that uses slightly
different
hardware and network infrastructure, and could not reproduce the
problem.
Hope at least some of the above makes some sense. Any additional advice
would be greatl
t, and those seem to have kept working all along, until
the app got stuck.
Once this valgrind experiment is over, I'll proceed to your other
suggestion about the debug loop on the server side checking for any of
the requests the app is waiting for being MPI_REQUEST_NULL.
Many thanks,
Daniel
know how your process is exiting? If a process dies via
signal, OMPI *should* be seeing that and cleaning up the whole job
properly.
On Sep 12, 2007, at 10:50 PM, Daniel Rozenbaum wrote:
Hello,
I'm working on an MPI application for which I recently started
using Open MPI instead of LA
Hello,
I'm working on an MPI application for which I recently started using Open MPI
instead of LAM/MPI. Both with Open MPI and LAM/MPI it mostly runs ok, but
there're a number of cases under which the application terminates abnormally
when using LAM/MPI, and hangs when using Open MPI. I haven'