> The original issue, still reflected by the subject heading of this e-mail, > was that a message overran its receive buffer. That was fixed by using > tags to distinguish different kinds of messages (res, jacob, row, and col). > > I thought the next problem was the small (10^-10) variations in results > when np>2. In my mind, a plausible explanation for this is that you're > adding the "res_cpu" contributions from all the various processes to the > "res" array in some arbitrary order. The contribution from rank 0 is added > in first, but all the others come in in some nondeterministic order. Since > you're using finite-precision arithmetic, this can lead to tiny round-off > variations. > > If you want to get rid of those minor variations, you have to perform > floating-point arithmetic in a particular order.
Unfortunately it did not work. I replaced the "MPI_ANY_SOURCE" with "JW" but I did not see any difference.