Dear users,
I'm running the popular Calculate PI program on a 2 node setting running
ubuntu 8.10 and openmpi1.3.3(with default settings). Password-less ssh
is set up but no cluster management program such as network file system,
network time protocol, resource management, scheduler, etc. The two
nodes are connected though TCP/IP only.
When I tried to benchmark the program, it shows that the time spent on
MPI_Reduce(), is proportional to the Number-of-Intervals (n) used in
calculation. For example, when n = 1,000,000, MPI_Reduce costs 15.65
milliseconds; while n= 1,000,000,000, MPI_Reduce costs 15526 milliseconds.
This confused me - in this Calc-PI program, MPI_Reduce is used only once
- no matter what number of intervals is used, MPI_Reduce is called after
both nodes got the result, to merge the result - just once. So the time
cost by MPI_Reduce (all though it might be slow through TCP/IP
connection) should be somewhat consistent. But obviously it's not what I
saw.
Had anyone have the similar problem before? I'm not sure how
MPI_Reduce() work internally. Does the fact that I don't have network
file system, network time protocol, resource management, scheduler, etc
installed matters?
Below is the program - I did feed "n" to it more than once to warm it up.
#include "mpi.h"
#include <stdio.h>
#include <math.h>
int main(int argc, char *argv[])
{
int numprocs, myid, rc;
double ACCUPI = 3.1415926535897932384626433832795;
double mypi, pi, h, sum, x;
int n, i;
double starttime, endtime;
double time,told,bcasttime,reducetime,comptime,totaltime;
rc = MPI_Init(&argc,&argv);
if (rc != MPI_SUCCESS) {
printf("Error starting MPI program. Terminating.\n");
MPI_Abort(MPI_COMM_WORLD, rc);
}
MPI_Comm_size(MPI_COMM_WORLD,&numprocs);
MPI_Comm_rank(MPI_COMM_WORLD,&myid);
while (1) {
if (myid == 0) {
printf("Enter the number of intervals: (0 quits) \n");
scanf("%d",&n);
starttime = MPI_Wtime();
}
time = MPI_Wtime();
MPI_Bcast(&n, 1, MPI_INT, 0, MPI_COMM_WORLD);
told = time;
time = MPI_Wtime();
bcasttime = time - told;
if (n == 0)
break;
else {
h = 1.0/(double)n;
sum = 0.0;
for (i = myid + 1; i <= n; i += numprocs) {
x = h*((double)i - 0.5);
sum += (4.0/(1.0 + x*x));
}
mypi = sum*h;
told = time;
time = MPI_Wtime();
comptime = time - told;
MPI_Reduce(&mypi, &pi, 1, MPI_DOUBLE, MPI_SUM, 0, MPI_COMM_WORLD);
told = time;
time = MPI_Wtime();
reducetime = time - told;
if (myid == 0) {
totaltime = MPI_Wtime() - starttime;
printf("\nElapsed time (total): %f
milliseconds\n",totaltime*1000);
printf("Elapsed time (Bcast): %f milliseconds
(%5.2f%%)\n",bcasttime*1000,bcasttime*100/totaltime);
printf("Elapsed time (Reduce): %f milliseconds
(%5.2f%%)\n",reducetime*1000,reducetime*100/totaltime);
printf("Elapsed time (Comput): %f milliseconds
(%5.2f%%)\n",comptime*1000,comptime*100/totaltime);
printf("\nApproximated pi is %.16f, Error is %.4e\n", pi,
fabs(pi - ACCUPI));
}
}
}
MPI_Finalize();
}