> Rank 0 accumulates all the res_cpu values into a single array, res. It > starts with its own res_cpu and then adds all other processes. When > np=2, that means the order is prescribed. When np>2, the order is no > longer prescribed and some floating-point rounding variations can start > to occur.
Yes you are right. Now, the question is why would these floating-point rounding variations occur for np>2? It cannot be due to a not prescribed order!! > If you want results to be more deterministic, you need to fix the order > in which res is aggregated. E.g., instead of using MPI_ANY_SOURCE, loop > over the peer processes in a specific order. > P.S. It seems to me that you could use MPI collective operations to > implement what you're doing. E.g., something like: I could use these operations for the res variable (Will it make the summation any faster?). But, I can not use them for the other 3 variables.