On Fri, Apr 15, 2011 at 2:04 PM, Tobias Burnus <bur...@net-b.de> wrote:
> On 04/15/2011 11:52 AM, Janne Blomqvist wrote:
>>
>> Q1: Is __sync_synchronize() sufficient?
>> I don't think this is correct. __sync_synchronize() just issues a
>> hardware memory fence instruction.That is, it prevents loads and
>> stores from moving past the fence *on the processor that executes the
>> fence instruction*. There is no synchronization with other
>> processors.
>
> Well, I was thinking of (a) assumptions regarding the value for the compiler
> when doing optimizations. And (b) making sure that the variables are really
> loaded from memory and not remain in the register. -- How the data ends up
> in memory is a different question; for the current library version, SYNC ALL
> would be a __sync_synchronize() followed by a (wrapped) call to MPI_Barrier
> - and possibly some additional actions.
>
>>> Q2: Can this be optimized in some way?
>>
>> Probably not. For general issues with the shared-memory model, perhaps
>> shared memory Co-arrays can piggyback on the work being done for the
>> C++0x memory model, see
>
> I think you try to solve a different problem than I want. I am not talking
> about implementing a full SYNC ALL, but I want to implement for SYNC ALL
> that no code moving happens and that the memory is moved out of the register
> into the memory - and related, fetched from the memory afterwards.
>
>
> On 04/15/2011 12:02 PM, Richard Guenther wrote:
>>>>
>>>> Q2: Can this be optimized in some way?
>>
>> For simple types you could use atomic instructions for the modification
>> itself instead of two SYNC ALL calls.
>
> Well, even with atomic you need to have a barrier; besides the example was
> only for illustration. I think if one uses the variable in "foo" before the
> first sync all, one even would need two barriers - atomic read/write or not.
> (For the current example, setting the value in "foo" is pointless. And the
> obfuscated way the variable is set, makes the program fragile: someone
> modifying might not see the dependency and break it.)

As long as all variables that need protection are global a random function
call before and after is enough to avoid optimization.

> To conclude:
>
> * For ASYNCHRONOUS, one mostly does not need to do anything. Except that for
> the asynchronous version of the transfer function belonging to READ and
> WRITE, the data argument needs to be marked as escaping in the "fn spec"
> attribute. Similarly, for ASYNCHRONOUS dummy arguments, the "fn spec" must
> be such that the compiler knows the the address could be escaping. (I don't
> think there is currently a way to mark via "fn spec" a variable as escaping
> but only be used for reading the value - or to restrict the scope of the
> escaping.)

That's correct.

Richard.

> * For coarrays, I still claim that __sync_synchronize() is enough for SYNC*
> in terms of restricting code moving and ensuring the registers are put into
> the memory - and for succeeding accesses to the variable, the data comes
> from the memory. (The actual implementation of a barrier is a separate task
> - be it a library call or some shared-memory atomic counter. Only for SYNC
> MEMORY it should be fully sufficient.)
>
> Comments?
>
> Tobias
>
> PS: The coarray example will fail if there more than two images as one can
> wait for ever for the SYNC with image 3, with image 4, ...
>

Reply via email to