Re: RFC: Telling the middle end about asynchronous/single-sided memory access (Fortran related)

Tobias Burnus Fri, 15 Apr 2011 05:04:56 -0700

On 04/15/2011 11:52 AM, Janne Blomqvist wrote:

Q1: Is __sync_synchronize() sufficient?
I don't think this is correct. __sync_synchronize() just issues a
hardware memory fence instruction.That is, it prevents loads and
stores from moving past the fence *on the processor that executes the
fence instruction*. There is no synchronization with other
processors.

Well, I was thinking of (a) assumptions regarding the value for thecompiler when doing optimizations. And (b) making sure that thevariables are really loaded from memory and not remain in the register.-- How the data ends up in memory is a different question; for thecurrent library version, SYNC ALL would be a __sync_synchronize()followed by a (wrapped) call to MPI_Barrier - and possibly someadditional actions.

Q2: Can this be optimized in some way?

Probably not. For general issues with the shared-memory model, perhaps
shared memory Co-arrays can piggyback on the work being done for the
C++0x memory model, see

I think you try to solve a different problem than I want. I am nottalking about implementing a full SYNC ALL, but I want to implement forSYNC ALL that no code moving happens and that the memory is moved out ofthe register into the memory - and related, fetched from the memoryafterwards.



On 04/15/2011 12:02 PM, Richard Guenther wrote:

Q2: Can this be optimized in some way?

For simple types you could use atomic instructions for the modification
itself instead of two SYNC ALL calls.

Well, even with atomic you need to have a barrier; besides the examplewas only for illustration. I think if one uses the variable in "foo"before the first sync all, one even would need two barriers - atomicread/write or not.(For the current example, setting the value in "foo" is pointless. Andthe obfuscated way the variable is set, makes the program fragile:someone modifying might not see the dependency and break it.)


To conclude:

* For ASYNCHRONOUS, one mostly does not need to do anything. Except thatfor the asynchronous version of the transfer function belonging to READand WRITE, the data argument needs to be marked as escaping in the "fnspec" attribute. Similarly, for ASYNCHRONOUS dummy arguments, the "fnspec" must be such that the compiler knows the the address could beescaping. (I don't think there is currently a way to mark via "fn spec"a variable as escaping but only be used for reading the value - or torestrict the scope of the escaping.)

* For coarrays, I still claim that __sync_synchronize() is enough forSYNC* in terms of restricting code moving and ensuring the registers areput into the memory - and for succeeding accesses to the variable, thedata comes from the memory. (The actual implementation of a barrier is aseparate task - be it a library call or some shared-memory atomiccounter. Only for SYNC MEMORY it should be fully sufficient.)


Comments?

Tobias

PS: The coarray example will fail if there more than two images as onecan wait for ever for the SYNC with image 3, with image 4, ...

Re: RFC: Telling the middle end about asynchronous/single-sided memory access (Fortran related)

Reply via email to