Test

David Büttner Thu, 19 May 2011 06:34:01 -0400

Hello,

I am working on a hybrid MPI (OpenMPI 1.4.3) and Pthread code. I amusing MPI_Isend and MPI_Irecv for communication and MPI_Test/MPI_Wait tocheck if it is done. I do this repeatedly in the outer loop of my code.The MPI_Test is used in the inner loop to check if some function can becalled which depends on the received data.The program regularly crashed (only when not using printf...) and afterdebugging it I figured out the following problem:

In MPI_Isend I have an invalid read of memory. I fixed the problem withnot re-using a


MPI_Request req_s, req_r;

but by using

MPI_Request* req_s;
MPI_Request* req_r

and re-allocating them before the MPI_Isend/recv.

The documentation says, that in MPI_Wait and MPI_Test (if successful)the request-objects are deallocated and set to MPI_REQUEST_NULL.It also says, that in MPI_Isend and MPI_Irecv, it allocates the Objectsand associates it with the request object.

As I understand this, this either means I can use a pointer toMPI_Request which I don't have to initialize for this (it doesn't workbut crashes), or that I can use a MPI_Request pointer which I haveinitialized with malloc(sizeof(MPI_REQUEST)) (or passing the address ofa MPI_Request req), which is set and unset in the functions. But thisversion crashes, too.What works is using a pointer, which I allocate before theMPI_Isend/recv and which I free after MPI_Wait in every iteration. Inother words: It only uses if I don't reuse any kind of MPI_Request. Onlyif I recreate one every time.

Is this, what is should be like? I believe that a reuse of the memorywould be a lot more efficient (less calls to malloc...). Am I missingsomething here? Or am I doing something wrong?



Let me provide some more detailed information about my problem:

I am running the program on a 30 node infiniband cluster. Each node has4 single core Opteron CPUs. I am running 1 MPI Rank per node and 4threads per rank (-> one thread per core).

I am compiling with mpicc of OpenMPI using gcc below.
Some pseudo-code of the program can be found at the end of this e-mail.

I was able to reproduce the problem using different amount of nodes andeven using one node only. The problem does not arise when I putprintf-debugging information into the code. This pointed me into thedirection that I have some memory problem, where some write accessessome memory it is not supposed to.I ran the tests using valgrind with --leak-check=full and--show-reachable=yes, which pointed me either to MPI_Isend or MPI_Waitdepending on whether I had the threads spin in a loop for MPI_Test toreturn success or used MPI_Wait respectively.

I would appreciate your help with this. Am I missing something importanthere? Is there a way to re-use the request in the different iterationsother than I thought it should work?Or is there a way to re-initialize the allocated memory before theMPI_Isend/recv so that I at least don't have to call free and malloceach time?


Thank you very much for your help!
Kind regards,
David Büttner

_____________________
Pseudo-Code of program:

MPI_Request* req_s;
MPI_Request* req_w;
OUTER-LOOP
    if(0 == threadid)
    {
        req_s = malloc(sizeof(MPI_Request));
        req_r = malloc(sizeof(MPI_Request));
        MPI_Isend(..., req_s)
        MPI_Irecv(..., req_r)
    }
    pthread_barrier
    INNER-LOOP (while NOT_DONE or RET)
        if(TRYLOCK && NOT_DONE)
        {
            if(MPI_TEST(req_r))
            {
                    Call_Function_A;
                    NOT_DONE = 0;
            }

        }
        RET =  Call_Function_B;
    }
    pthread_barrier_wait
    if(0 == threadid)
    {
        MPI_WAIT(req_s)
        MPI_WAIT(req_r)
        free(req_s);
        free(req_r);
    }
_____________


--
David Büttner, Informatik, Technische Universität München
TUM I-10 - FMI 01.06.059 - Tel. 089 / 289-17676

[OMPI users] Problem with MPI_Request, MPI_Isend/recv and MPI_Wait/Test

Reply via email to