Manfred Muecke wrote:
I would like to understand in more detail how much time some collective
communication calls really spend waiting for the last process to enter. I
know this can be done by logging entry times for each process, but I
wonder if there is a better and more efficient way.
"Better and more efficient" depends on various considerations. For some
set of considerations, an expedient solution would be to write PMPI
wrappers for the collectives in question where you force the collective
operation to perform a barrier first. Then, just look at how much time
is spent in such barriers. I don't know if that particular approach
works for you.
E.g.,
int MPI_Allreduce() {
PMPI_Barrier();
return PMPI_Allreduce();
}
with the appropriate dressings.