Josh,

Some time ago I was studying CRCP component, I´m not sure, but I remember that this component is used for bookmark exchange. You store these informations exactly for this (bookmark exchange)? After a successfully checkpoint operation you can free this memory?

Thanks,
Leonardo

Josh Hursey escribió:
Leonardo,

You are exactly correct. The CRCP module/component will grow the application size probably for every message that you send or receive. This is because the CRCP component tracks the signature {data_size, tag, communicator, peer} (*not* the contents of the message) of every message sent/received.

I have in development some fixes for the CRCP component to make it behave a bit better for large numbers of messages, and as a result will also help control the number of memory allocations needed by this component. Unfortunately it is not 100% ready for public use at the moment, but hopefully soon.

As an aside: to clearly see the effect of turning the CRCP component on/off at runtime try the two commands below:
Without CRCP:
  shell$ mpirun -np 2 -am ft-enable-cr -mca crcp none simple-ping 20 1
With CRCP:
  shell$ mpirun -np 2 -am ft-enable-cr simple-ping 20 1

-- Josh

On May 29, 2008, at 7:54 AM, Leonardo Fialho wrote:

Hi All,

I made some tests with a dummy "ping" application. Some memory problems occurred. On these tests I obtained the following results:

1) OpenMPI (without FT):
- delaying 1 second to send token to other node: orted and application size stable; - delaying 0 seconds to send token to other node: orted and application size stable.

2) OpenMPI (with CRCP FT):
- delaying 1 second to send token to other node: orted stable and application size grow in the first seconds and establish; - delaying 0 seconds to send token to other node: orted stable and application size growing all the time.

I think that it is something in the CRCP module/component...

Thanks,

--
Leonardo Fialho
Computer Architecture and Operating Systems Department - CAOS
Universidad Autonoma de Barcelona - UAB
ETSE, Edifcio Q, QC/3088
http://www.caos.uab.es
Phone: +34-93-581-2888
Fax: +34-93-581-2478

#include </softs/openmpi/include/mpi.h>
#include <stdio.h>
#include <stdlib.h>

int main (int argc, char *argv[]) {
   double time_end, time_start;
   int count, rank, fim, x;
   char buffer[5] = "test!";
   MPI_Status status;

   if (3 > argc) {
printf("\n Insuficient arguments (%d)\n\n ping <times> <delay>\n\n", argc);
     exit(1);
   }

   if (MPI_Init(&argc, &argv) == MPI_SUCCESS) {
       time_start = MPI_Wtime();
       MPI_Comm_size (MPI_COMM_WORLD, &count);
       MPI_Comm_rank (MPI_COMM_WORLD, &rank );
       for (fim = 1; fim <= atoi(argv[1]); fim++) {
           if (rank == 0) {
               printf("(%d) sent token to (%d)\n", rank, rank+1);
               fflush(stdout);
               sleep(atoi(argv[2]));
               MPI_Send(buffer, 5, MPI_CHAR, 1, 1, MPI_COMM_WORLD);
MPI_Recv(buffer, 5, MPI_CHAR, count-1, 1, MPI_COMM_WORLD, &status);
           } else {
MPI_Recv(buffer, 5, MPI_CHAR, rank-1, 1, MPI_COMM_WORLD, &status); printf("(%d) sent token to (%d)\n", rank, (rank==(count-1) ? 0 : rank+1));
               fflush(stdout);
               sleep(atoi(argv[2]));
MPI_Send(buffer, 5, MPI_CHAR, (rank==(count-1) ? 0 : rank+1), 1, MPI_COMM_WORLD);
           }
       }
   }

   time_end = MPI_Wtime();
   MPI_Finalize();

   if (rank == 0) {
       printf("%f\n", time_end - time_start);
   }

   return 0;
}
_______________________________________________
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users

_______________________________________________
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users


--
Leonardo Fialho
Computer Architecture and Operating Systems Department - CAOS
Universidad Autonoma de Barcelona - UAB
ETSE, Edifcio Q, QC/3088
http://www.caos.uab.es
Phone: +34-93-581-2888
Fax: +34-93-581-2478

Reply via email to