On 04/15/2011 11:52 AM, Janne Blomqvist wrote:
Q1: Is __sync_synchronize() sufficient?
I don't think this is correct. __sync_synchronize() just issues a
hardware memory fence instruction.That is, it prevents loads and
stores from moving past the fence *on the processor that executes the
fence instruction*. There is no synchronization with other
processors.

Well, I was thinking of (a) assumptions regarding the value for the compiler when doing optimizations. And (b) making sure that the variables are really loaded from memory and not remain in the register. -- How the data ends up in memory is a different question; for the current library version, SYNC ALL would be a __sync_synchronize() followed by a (wrapped) call to MPI_Barrier - and possibly some additional actions.

Q2: Can this be optimized in some way?
Probably not. For general issues with the shared-memory model, perhaps
shared memory Co-arrays can piggyback on the work being done for the
C++0x memory model, see

I think you try to solve a different problem than I want. I am not talking about implementing a full SYNC ALL, but I want to implement for SYNC ALL that no code moving happens and that the memory is moved out of the register into the memory - and related, fetched from the memory afterwards.


On 04/15/2011 12:02 PM, Richard Guenther wrote:
Q2: Can this be optimized in some way?
For simple types you could use atomic instructions for the modification
itself instead of two SYNC ALL calls.

Well, even with atomic you need to have a barrier; besides the example was only for illustration. I think if one uses the variable in "foo" before the first sync all, one even would need two barriers - atomic read/write or not. (For the current example, setting the value in "foo" is pointless. And the obfuscated way the variable is set, makes the program fragile: someone modifying might not see the dependency and break it.)

To conclude:

* For ASYNCHRONOUS, one mostly does not need to do anything. Except that for the asynchronous version of the transfer function belonging to READ and WRITE, the data argument needs to be marked as escaping in the "fn spec" attribute. Similarly, for ASYNCHRONOUS dummy arguments, the "fn spec" must be such that the compiler knows the the address could be escaping. (I don't think there is currently a way to mark via "fn spec" a variable as escaping but only be used for reading the value - or to restrict the scope of the escaping.)

* For coarrays, I still claim that __sync_synchronize() is enough for SYNC* in terms of restricting code moving and ensuring the registers are put into the memory - and for succeeding accesses to the variable, the data comes from the memory. (The actual implementation of a barrier is a separate task - be it a library call or some shared-memory atomic counter. Only for SYNC MEMORY it should be fully sufficient.)

Comments?

Tobias

PS: The coarray example will fail if there more than two images as one can wait for ever for the SYNC with image 3, with image 4, ...

Reply via email to