On Fri, Apr 15, 2011 at 22:19, N.M. Maclaren <n...@cam.ac.uk> wrote: > On Apr 15 2011, Janne Blomqvist wrote: >>>> >> >> Indeed, I assumed you were discussing how to implement CAF via shared >> memory. If we use MPI, surely the implementation of MPI_Barrier should >> itself issue any necessary memory fences (if it uses shared memory), >> so I don't think __sync_synchronize() would be necessary? > > It doesn't have any such semantics.
I don't understand what you mean by that. My point is that if a shared memory implementation of MPI_Barrier requires issuing memory fence instructions as part of the barrier algorithm, then it's up to the MPI implementation to issue them and thus we don't need to worry about it. Or are you saying that the semantics of CAF SYNC ALL is MPI_Barrier + a full memory fence on all images, and as we cannot guarantee that MPI_Barrier will actually issue fences we must potentially issue another one? >> And, as >> Richi already mentioned, the function call itself is an implicit >> compiler memory barrier for all variables which might be accessed by >> the callee. Which implies that any such variables must be flushed to >> memory before the call and reloaded if read after the call returns. >> So, in this case I don't think there is anything to worry about. > > Not in a threaded context, or with shared memory segments! My point was that even if we have shared memory, we don't need to handle function calls any differently. > Mere function calls have no effect whatsoever on memory consistency > between threads, and you really, but REALLY, want to avoid doing full > memory barriers more than you can help. Indeed, which is why shared variable programming models tend to have explicit synchronization constructs instead of having implicit barriers at function call boundaries. > For complicated reasons, they > can be more expensive than MPI_Barrier, though they are at least sane > and reliable (unlike 'fences'). Really? Why? -- Janne Blomqvist