Re: [OMPI users] MPI_Probe succeeds, but subsequent MPI_Recv gets stuck

2007-10-18 Thread Daniel Rozenbaum
ome reason, OMPI appears to have decided that it had not yet received the message. Perhaps a memory bug in your application...? Have you run it through valgrind, or some other memory-checking debugger, perchance? On Oct 18, 2007, at 12:35 PM, Daniel Rozenbaum wrote: Unfortunately, so far

Re: [OMPI users] MPI_Probe succeeds, but subsequent MPI_Recv gets stuck

2007-10-18 Thread Daniel Rozenbaum
omplete == false" and calls opal_condition_wait(). Jeff Squyres wrote: Can you send a short test program that shows this problem, perchance? On Oct 3, 2007, at 1:41 PM, Daniel Rozenbaum wrote: Hi again, I'm trying to debug the problem I posted on several times recently; I thought I

[OMPI users] MPI_Probe succeeds, but subsequent MPI_Recv gets stuck

2007-10-03 Thread Daniel Rozenbaum
Hi again, I'm trying to debug the problem I posted on several times recently; I thought I'd try asking a more focused question: I have the following sequence in the client code: MPI_Status stat; ret = MPI_Probe(0, MPI_ANY_TAG, MPI_COMM_WORLD, &stat); assert(ret == MPI_SUCCESS); ret = M

Re: [OMPI users] Application using OpenMPI 1.2.3 hangs, error messages in mca_btl_tcp_frag_recv

2007-09-28 Thread Daniel Rozenbaum
d()'s and three unprocessed Irecv()'s. I've upgraded to Open MPI 1.2.4, but this made no difference. Are there any internal logging or debugging facilities in Open MPI that would allow me to further track the calls that eventually result in the error in mca_btl_tcp_frag_recv() ? Tha

Re: [OMPI users] Application using OpenMPI 1.2.3 hangs, error messages in mca_btl_tcp_frag_recv

2007-09-27 Thread Daniel Rozenbaum
at the beginning of the run and are processed correctly though. Also, I ran the same experiment on another cluster that uses slightly different hardware and network infrastructure, and could not reproduce the problem. Hope at least some of the above makes some sense. Any additional advice would be greatl

Re: [OMPI users] Application using OpenMPI 1.2.3 hangs, error messages in mca_btl_tcp_frag_recv

2007-09-19 Thread Daniel Rozenbaum
t, and those seem to have kept working all along, until the app got stuck. Once this valgrind experiment is over, I'll proceed to your other suggestion about the debug loop on the server side checking for any of the requests the app is waiting for being MPI_REQUEST_NULL. Many thanks, Daniel

Re: [OMPI users] Application using OpenMPI 1.2.3 hangs, error messages in mca_btl_tcp_frag_recv

2007-09-17 Thread Daniel Rozenbaum
know how your process is exiting? If a process dies via signal, OMPI *should* be seeing that and cleaning up the whole job properly. On Sep 12, 2007, at 10:50 PM, Daniel Rozenbaum wrote: Hello, I'm working on an MPI application for which I recently started using Open MPI instead of LA

[OMPI users] Application using OpenMPI 1.2.3 hangs, error messages in mca_btl_tcp_frag_recv

2007-09-12 Thread Daniel Rozenbaum
Hello, I'm working on an MPI application for which I recently started using Open MPI instead of LAM/MPI. Both with Open MPI and LAM/MPI it mostly runs ok, but there're a number of cases under which the application terminates abnormally when using LAM/MPI, and hangs when using Open MPI. I haven'