In any case, do you think that the time NOT spent on actual data
transmission can impact the total time of the broadcast especially when
there are so many groups that communicate (please refer to the numbers I
gave before if you want to get an idea).
Also, is there any way to quantify this impact
Konstantions,
A simple way is to rewrite MPI_Bcast() and insert timer and
PMPI_Barrier() before invoking the real PMPI_Bcast().
time spent in PMPI_Barrier() can be seen as time NOT spent on actual
data transmission,
and since all tasks are synchronized upon exit, time spent in
PMPI_Bcast() can be
Gilles suggested your best next course of action; time the MPI_Bcast and
MPI_Barrier calls and see if there’s a non-linear scaling effect as you
increase group size.
You mention that you’re using m3.large instances; while this isn’t the list for
in-depth discussion about EC2 instances (the AWS
I do not completely understand whether that involves changing some MPI
code. I have no prior experience with that.
But if I get the idea something like this could potentially work (assume
that comm is the communicator of the groups that communicates at each
iteration):
*clock_t total_time = cloc
Konstantinos,
i previously suggested you use the profiler interface (aka PMPI)
specified in the MPI standard.
An example is available at
http://mpi-forum.org/docs/mpi-3.1/mpi31-report/node363.htm#Node363
The pro is you simply need to rewrite MPI_Bcast() vs add some code
around each MPI_Bc