On Fri, Apr 15, 2011 at 2:04 PM, Tobias Burnus <bur...@net-b.de> wrote: > On 04/15/2011 11:52 AM, Janne Blomqvist wrote: >> >> Q1: Is __sync_synchronize() sufficient? >> I don't think this is correct. __sync_synchronize() just issues a >> hardware memory fence instruction.That is, it prevents loads and >> stores from moving past the fence *on the processor that executes the >> fence instruction*. There is no synchronization with other >> processors. > > Well, I was thinking of (a) assumptions regarding the value for the compiler > when doing optimizations. And (b) making sure that the variables are really > loaded from memory and not remain in the register. -- How the data ends up > in memory is a different question; for the current library version, SYNC ALL > would be a __sync_synchronize() followed by a (wrapped) call to MPI_Barrier > - and possibly some additional actions. > >>> Q2: Can this be optimized in some way? >> >> Probably not. For general issues with the shared-memory model, perhaps >> shared memory Co-arrays can piggyback on the work being done for the >> C++0x memory model, see > > I think you try to solve a different problem than I want. I am not talking > about implementing a full SYNC ALL, but I want to implement for SYNC ALL > that no code moving happens and that the memory is moved out of the register > into the memory - and related, fetched from the memory afterwards. > > > On 04/15/2011 12:02 PM, Richard Guenther wrote: >>>> >>>> Q2: Can this be optimized in some way? >> >> For simple types you could use atomic instructions for the modification >> itself instead of two SYNC ALL calls. > > Well, even with atomic you need to have a barrier; besides the example was > only for illustration. I think if one uses the variable in "foo" before the > first sync all, one even would need two barriers - atomic read/write or not. > (For the current example, setting the value in "foo" is pointless. And the > obfuscated way the variable is set, makes the program fragile: someone > modifying might not see the dependency and break it.)
As long as all variables that need protection are global a random function call before and after is enough to avoid optimization. > To conclude: > > * For ASYNCHRONOUS, one mostly does not need to do anything. Except that for > the asynchronous version of the transfer function belonging to READ and > WRITE, the data argument needs to be marked as escaping in the "fn spec" > attribute. Similarly, for ASYNCHRONOUS dummy arguments, the "fn spec" must > be such that the compiler knows the the address could be escaping. (I don't > think there is currently a way to mark via "fn spec" a variable as escaping > but only be used for reading the value - or to restrict the scope of the > escaping.) That's correct. Richard. > * For coarrays, I still claim that __sync_synchronize() is enough for SYNC* > in terms of restricting code moving and ensuring the registers are put into > the memory - and for succeeding accesses to the variable, the data comes > from the memory. (The actual implementation of a barrier is a separate task > - be it a library call or some shared-memory atomic counter. Only for SYNC > MEMORY it should be fully sufficient.) > > Comments? > > Tobias > > PS: The coarray example will fail if there more than two images as one can > wait for ever for the SYNC with image 3, with image 4, ... >