On 04/15/2011 11:52 AM, Janne Blomqvist wrote:
Q1: Is __sync_synchronize() sufficient?
I don't think this is correct. __sync_synchronize() just issues a
hardware memory fence instruction.That is, it prevents loads and
stores from moving past the fence *on the processor that executes the
fence instruction*. There is no synchronization with other
processors.
Well, I was thinking of (a) assumptions regarding the value for the
compiler when doing optimizations. And (b) making sure that the
variables are really loaded from memory and not remain in the register.
-- How the data ends up in memory is a different question; for the
current library version, SYNC ALL would be a __sync_synchronize()
followed by a (wrapped) call to MPI_Barrier - and possibly some
additional actions.
Q2: Can this be optimized in some way?
Probably not. For general issues with the shared-memory model, perhaps
shared memory Co-arrays can piggyback on the work being done for the
C++0x memory model, see
I think you try to solve a different problem than I want. I am not
talking about implementing a full SYNC ALL, but I want to implement for
SYNC ALL that no code moving happens and that the memory is moved out of
the register into the memory - and related, fetched from the memory
afterwards.
On 04/15/2011 12:02 PM, Richard Guenther wrote:
Q2: Can this be optimized in some way?
For simple types you could use atomic instructions for the modification
itself instead of two SYNC ALL calls.
Well, even with atomic you need to have a barrier; besides the example
was only for illustration. I think if one uses the variable in "foo"
before the first sync all, one even would need two barriers - atomic
read/write or not.
(For the current example, setting the value in "foo" is pointless. And
the obfuscated way the variable is set, makes the program fragile:
someone modifying might not see the dependency and break it.)
To conclude:
* For ASYNCHRONOUS, one mostly does not need to do anything. Except that
for the asynchronous version of the transfer function belonging to READ
and WRITE, the data argument needs to be marked as escaping in the "fn
spec" attribute. Similarly, for ASYNCHRONOUS dummy arguments, the "fn
spec" must be such that the compiler knows the the address could be
escaping. (I don't think there is currently a way to mark via "fn spec"
a variable as escaping but only be used for reading the value - or to
restrict the scope of the escaping.)
* For coarrays, I still claim that __sync_synchronize() is enough for
SYNC* in terms of restricting code moving and ensuring the registers are
put into the memory - and for succeeding accesses to the variable, the
data comes from the memory. (The actual implementation of a barrier is a
separate task - be it a library call or some shared-memory atomic
counter. Only for SYNC MEMORY it should be fully sufficient.)
Comments?
Tobias
PS: The coarray example will fail if there more than two images as one
can wait for ever for the SYNC with image 3, with image 4, ...