Dear users,

I'm running the popular Calculate PI program on a 2 node setting running ubuntu 8.10 and openmpi1.3.3(with default settings). Password-less ssh is set up but no cluster management program such as network file system, network time protocol, resource management, scheduler, etc. The two nodes are connected though TCP/IP only.

When I tried to benchmark the program, it shows that the time spent on MPI_Reduce(), is proportional to the Number-of-Intervals (n) used in calculation. For example, when n = 1,000,000, MPI_Reduce costs 15.65 milliseconds; while n= 1,000,000,000, MPI_Reduce costs 15526 milliseconds.

This confused me - in this Calc-PI program, MPI_Reduce is used only once - no matter what number of intervals is used, MPI_Reduce is called after both nodes got the result, to merge the result - just once. So the time cost by MPI_Reduce (all though it might be slow through TCP/IP connection) should be somewhat consistent. But obviously it's not what I saw.

Had anyone have the similar problem before? I'm not sure how MPI_Reduce() work internally. Does the fact that I don't have network file system, network time protocol, resource management, scheduler, etc installed matters?

Below is the program - I did feed "n" to it more than once to warm it up.

#include "mpi.h"
#include <stdio.h>
#include <math.h>

int main(int argc, char *argv[]) { int numprocs, myid, rc;
  double ACCUPI = 3.1415926535897932384626433832795;
  double mypi, pi, h, sum, x;
  int n, i;
  double starttime, endtime;
  double time,told,bcasttime,reducetime,comptime,totaltime;

  rc = MPI_Init(&argc,&argv);
  if (rc != MPI_SUCCESS) {
     printf("Error starting MPI program. Terminating.\n");
     MPI_Abort(MPI_COMM_WORLD, rc);
  }
  MPI_Comm_size(MPI_COMM_WORLD,&numprocs);
  MPI_Comm_rank(MPI_COMM_WORLD,&myid);

  while (1) {
     if (myid == 0) {
        printf("Enter the number of intervals: (0 quits) \n");
        scanf("%d",&n);
        starttime = MPI_Wtime();
     }

     time = MPI_Wtime();
     MPI_Bcast(&n, 1, MPI_INT, 0, MPI_COMM_WORLD);

     told = time;
     time = MPI_Wtime();
     bcasttime = time - told;

     if (n == 0)
        break;
     else {
        h = 1.0/(double)n;
        sum = 0.0;
        for (i = myid + 1; i <= n; i += numprocs) {
            x = h*((double)i - 0.5);
            sum += (4.0/(1.0 + x*x));
        }
        mypi = sum*h;

        told = time;
        time = MPI_Wtime();
        comptime = time - told;

        MPI_Reduce(&mypi, &pi, 1, MPI_DOUBLE, MPI_SUM, 0, MPI_COMM_WORLD);

        told = time;
        time = MPI_Wtime();
        reducetime = time - told;

        if (myid == 0) {
           totaltime = MPI_Wtime() - starttime;
printf("\nElapsed time (total): %f milliseconds\n",totaltime*1000); printf("Elapsed time (Bcast): %f milliseconds (%5.2f%%)\n",bcasttime*1000,bcasttime*100/totaltime); printf("Elapsed time (Reduce): %f milliseconds (%5.2f%%)\n",reducetime*1000,reducetime*100/totaltime); printf("Elapsed time (Comput): %f milliseconds (%5.2f%%)\n",comptime*1000,comptime*100/totaltime); printf("\nApproximated pi is %.16f, Error is %.4e\n", pi, fabs(pi - ACCUPI));
        }
     }
  }

MPI_Finalize(); }

Reply via email to