Leonardo,
You are exactly correct. The CRCP module/component will grow the
application size probably for every message that you send or receive.
This is because the CRCP component tracks the signature {data_size,
tag, communicator, peer} (*not* the contents of the message) of every
message sent/received.
I have in development some fixes for the CRCP component to make it
behave a bit better for large numbers of messages, and as a result
will also help control the number of memory allocations needed by
this
component. Unfortunately it is not 100% ready for public use at the
moment, but hopefully soon.
As an aside: to clearly see the effect of turning the CRCP component
on/off at runtime try the two commands below:
Without CRCP:
shell$ mpirun -np 2 -am ft-enable-cr -mca crcp none simple-ping
20 1
With CRCP:
shell$ mpirun -np 2 -am ft-enable-cr simple-ping 20 1
-- Josh
On May 29, 2008, at 7:54 AM, Leonardo Fialho wrote:
Hi All,
I made some tests with a dummy "ping" application. Some memory
problems occurred. On these tests I obtained the following results:
1) OpenMPI (without FT):
- delaying 1 second to send token to other node: orted and
application size stable;
- delaying 0 seconds to send token to other node: orted and
application size stable.
2) OpenMPI (with CRCP FT):
- delaying 1 second to send token to other node: orted stable and
application size grow in the first seconds and establish;
- delaying 0 seconds to send token to other node: orted stable and
application size growing all the time.
I think that it is something in the CRCP module/component...
Thanks,
--
Leonardo Fialho
Computer Architecture and Operating Systems Department - CAOS
Universidad Autonoma de Barcelona - UAB
ETSE, Edifcio Q, QC/3088
http://www.caos.uab.es
Phone: +34-93-581-2888
Fax: +34-93-581-2478
#include </softs/openmpi/include/mpi.h>
#include <stdio.h>
#include <stdlib.h>
int main (int argc, char *argv[]) {
double time_end, time_start;
int count, rank, fim, x;
char buffer[5] = "test!";
MPI_Status status;
if (3 > argc) {
printf("\n Insuficient arguments (%d)\n\n ping <times>
<delay>\n\n", argc);
exit(1);
}
if (MPI_Init(&argc, &argv) == MPI_SUCCESS) {
time_start = MPI_Wtime();
MPI_Comm_size (MPI_COMM_WORLD, &count);
MPI_Comm_rank (MPI_COMM_WORLD, &rank );
for (fim = 1; fim <= atoi(argv[1]); fim++) {
if (rank == 0) {
printf("(%d) sent token to (%d)\n", rank, rank+1);
fflush(stdout);
sleep(atoi(argv[2]));
MPI_Send(buffer, 5, MPI_CHAR, 1, 1, MPI_COMM_WORLD);
MPI_Recv(buffer, 5, MPI_CHAR, count-1, 1,
MPI_COMM_WORLD, &status);
} else {
MPI_Recv(buffer, 5, MPI_CHAR, rank-1, 1,
MPI_COMM_WORLD, &status);
printf("(%d) sent token to (%d)\n", rank,
(rank==(count-1) ? 0 : rank+1));
fflush(stdout);
sleep(atoi(argv[2]));
MPI_Send(buffer, 5, MPI_CHAR, (rank==(count-1) ? 0 :
rank+1), 1, MPI_COMM_WORLD);
}
}
}
time_end = MPI_Wtime();
MPI_Finalize();
if (rank == 0) {
printf("%f\n", time_end - time_start);
}
return 0;
}
_______________________________________________
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users
_______________________________________________
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users