Gus Correa wrote:
The redundant calculations of overlap points on neighbor subdomains
in general cannot be avoided.
Exchanging the overlap data across neighbor subdomain processes
cannot be avoided either.
However, **full overlap slices** are exchanged after each computational
step (in our case here a time step).
It is not a point-by-point exchange as you suggested.
Overlap exchange does limit the usefulness/efficiency
of using too many subdomains (e.g. if your overlap-to-useful-data
ratio gets close to 100%).
However, is not as detrimental as you imagined based on your
point-by-point exchange conjecture.
If your domain is 100x100x100 and you split in subdomain slices
across 5 processes, with a 1-point overlap (on each side)
you will have a 2x5/100 = 10% waste due to overlap calculations
(plus the MPI communication cost/time),
but your problem is still being solved in (almost) 1/5 of the time
it would take in serial mode.
I don't understand what "waste" or "redundant calculations" you're
referring to here. Let's say each cell is updated based on itself and
neighboring values. Each cell has a unique "owner" who computes the
updated value. That value is shared with neighbors if the cell is near
the subdomain surface. So, there is a communication cost, but typically
each new value is computed by only one process.
E.g., say we have (in this extreme example) four values in a ring to be
distributed among two processes. So, P0 owns values 1 and 2 and P1 owns
values 3 and 4. In each iteration, boundary values are communicated.
In this particular, extreme example, that means that each process will
have all values. But then each process only updates/computes the values
it owns. There is, to be sure, a communication and synchronization
cost; but each value is computed by only one process.