Re: [OMPI users] Help : Need some tuning, or is it a bug ?

Tim Prins Tue, 14 Aug 2007 10:19:52 -0400

Guillaume THOMAS-COLLIGNON wrote:

Hi,
I wrote an application which works fine on a small number of nodes(eg. 4), but it crashes on a large number of CPUs.
In this application, all the slaves send many small messages to themaster. I use the regular MPI_Send, and since the messages arerelatively small (1 int, then many times 3296 ints), OpenMPI does avery good job at sending them asynchronously, and it maxes out thegigabit link on the master node. I'm very happy with this behaviour,it gives me the same performance as if I was doing all theasynchronous stuff myself, and the code remains simple.
But it crashes when there are too many slaves.

How many is too many? I successfully ran your code on 96 nodes, with 4processes per node and it seemed to work fine. Also, what network areyou using?

So it looks like atsome point the master node runs out of buffers and the job crashesbrutally.

What do you mean by crashing? Is there a segfault or an error message?

Tim

That's my understanding but I may be wrong.
If I use explicit synchronous sends (MPI_Ssend), it does not crashanymore but the performance is a lot lower.
I have 2 questions regarding this :
1) What kind of tuning would help handling more messages and keep themaster from crashing ?
2) Is this the expected behaviour ? I don't think my code is doinganything wrong, so I would not expect a brutal crash.
The workaround I've found so far is to do an MPI_Ssend for therequest, then use MPI_Send for the data blocks. So all the slaves areblocked on the request, it keeps the master from being flooded, andthe performance is still good. But nothing tells me it won't crash atsome point if I have more data blocks in my real code, so I'd like toknow more about what's happening here.
Thanks,

        -Guillaume
Here is the code, so you get a better idea of the communicationscheme, or if you someone wants to reproduce the problem.
#include <stdio.h>
#include <stdlib.h>

#include <mpi.h>

#define BLOCKSIZE 3296
#define MAXBLOCKS 1000
#define NLOOP 4

int main (int argc, char **argv) {
   int i, j, ier, rank, npes, slave, request;
   int *data;
   MPI_Status status;

   MPI_Init (&argc, &argv);
   MPI_Comm_rank (MPI_COMM_WORLD, &rank);
   MPI_Comm_size (MPI_COMM_WORLD, &npes);

   if ((data = (int *) calloc (BLOCKSIZE, sizeof (int))) == NULL)
     return -10;

   // Master
   if (rank == 0) {
     // Expect (NLOOP * number of slaves) requests
     for (i=0; i<(npes-1)*NLOOP; i++) {
/* Wait for a request from any slave. Request contains numberof data blocks */ier = MPI_Recv(&request, 1, MPI_INT, MPI_ANY_SOURCE, 964,MPI_COMM_WORLD, &status);
       if (ier != MPI_SUCCESS)
        return -1;
       slave = status.MPI_SOURCE;
printf ("Master : request for %d blocks from slave %d\n",request, slave);
       /* Receive the data blocks from this slave */
       for (j=0; j<request; j++) {
ier = MPI_Recv (data, BLOCKSIZE, MPI_INT, slave, 993,MPI_COMM_WORLD, &status);
        if (ier != MPI_SUCCESS)
          return -2;
       }
     }
   }
   // Slaves
   else {
     for (i=0; i<NLOOP; i++) {
/* Send the request = number of blocks we want to send to themaster */
       request = MAXBLOCKS;
/* Changing this MPI_Send to MPI_Ssend is enough to keep the masterfrom being flooded */
       ier = MPI_Send (&request, 1, MPI_INT, 0, 964, MPI_COMM_WORLD);
       if (ier != MPI_SUCCESS)
        return -3;
       /* Send the data blocks */
       for (j=0; j<request; j++) {
        ier = MPI_Send (data, BLOCKSIZE, MPI_INT, 0, 993, MPI_COMM_WORLD);
        if (ier != MPI_SUCCESS)
          return -4;
       }
     }
   }
   printf ("Node %d done\n", rank);
   MPI_Finalize ();
}

Re: [OMPI users] Help : Need some tuning, or is it a bug ?

Reply via email to