Thank you so much! It is a synchronization issue. In my case, one node actually run slower than the other node. Adding MPE_Barrier() helps to straight things out.
Thank you for your help!

Eugene Loh wrote:
Your processes are probably running asynchronously. You could perhaps try tracing program execution and look at the timeline. E.g., http://www.open-mpi.org/faq/?category=perftools#free-tools . Or, where you have MPI_Wtime calls, just capture those timestamps on each process and dump the results at the end of your run. Or, report timings for all ranks instead of just for rank 0.

Put another way, rank 0 must broadcast n. So, no one starts computation until they get the Bcast result. Rank 0 probably starts its computations before anyone else does. So, it gets to the Reduce before anyone else does, but it can't exit until other ranks have finished their computations. So, the Reduce time on rank 0 includes some amount of other ranks' compute times.

Yet another approach is to insert MPI_Barrier calls at each phase of the program so that the various phases are synchronized. This adds some overhead to the program, but helps simplify interpretation of the timing results.

Qing Pang wrote:

I'm running the popular Calculate PI program on a 2 node setting running ubuntu 8.10 and openmpi1.3.3(with default settings). Password-less ssh is set up but no cluster management program such as network file system, network time protocol, resource management, scheduler, etc. The two nodes are connected though TCP/IP only.

When I tried to benchmark the program, it shows that the time spent on MPI_Reduce(), is proportional to the Number-of-Intervals (n) used in calculation. For example, when n = 1,000,000, MPI_Reduce costs 15.65 milliseconds; while n= 1,000,000,000, MPI_Reduce costs 15526 milliseconds.

This confused me - in this Calc-PI program, MPI_Reduce is used only once - no matter what number of intervals is used, MPI_Reduce is called after both nodes got the result, to merge the result - just once. So the time cost by MPI_Reduce (all though it might be slow through TCP/IP connection) should be somewhat consistent. But obviously it's not what I saw.

Had anyone have the similar problem before? I'm not sure how MPI_Reduce() work internally. Does the fact that I don't have network file system, network time protocol, resource management, scheduler, etc installed matters?

Below is the program - I did feed "n" to it more than once to warm it up.

#include "mpi.h"
#include <stdio.h>
#include <math.h>

int main(int argc, char *argv[])   {      int numprocs, myid, rc;
   double ACCUPI = 3.1415926535897932384626433832795;
   double mypi, pi, h, sum, x;
   int n, i;
   double starttime, endtime;
   double time,told,bcasttime,reducetime,comptime,totaltime;

   rc = MPI_Init(&argc,&argv);
   if (rc != MPI_SUCCESS) {
      printf("Error starting MPI program. Terminating.\n");
      MPI_Abort(MPI_COMM_WORLD, rc);
   }
   MPI_Comm_size(MPI_COMM_WORLD,&numprocs);
   MPI_Comm_rank(MPI_COMM_WORLD,&myid);

   while (1) {
      if (myid == 0) {
         printf("Enter the number of intervals: (0 quits) \n");
         scanf("%d",&n);
         starttime = MPI_Wtime();
      }

      time = MPI_Wtime();
      MPI_Bcast(&n, 1, MPI_INT, 0, MPI_COMM_WORLD);

      told = time;
      time = MPI_Wtime();
      bcasttime = time - told;

      if (n == 0)
         break;
      else {
         h = 1.0/(double)n;
         sum = 0.0;
         for (i = myid + 1; i <= n; i += numprocs) {
             x = h*((double)i - 0.5);
             sum += (4.0/(1.0 + x*x));
         }
         mypi = sum*h;

         told = time;
         time = MPI_Wtime();
         comptime = time - told;

MPI_Reduce(&mypi, &pi, 1, MPI_DOUBLE, MPI_SUM, 0, MPI_COMM_WORLD);

         told = time;
         time = MPI_Wtime();
         reducetime = time - told;

         if (myid == 0) {
            totaltime = MPI_Wtime() - starttime;
printf("\nElapsed time (total): %f milliseconds\n",totaltime*1000); printf("Elapsed time (Bcast): %f milliseconds (%5.2f%%)\n",bcasttime*1000,bcasttime*100/totaltime); printf("Elapsed time (Reduce): %f milliseconds (%5.2f%%)\n",reducetime*1000,reducetime*100/totaltime); printf("Elapsed time (Comput): %f milliseconds (%5.2f%%)\n",comptime*1000,comptime*100/totaltime); printf("\nApproximated pi is %.16f, Error is %.4e\n", pi, fabs(pi - ACCUPI));
         }
      }
   }

   MPI_Finalize();   }


_______________________________________________
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users


Reply via email to