On Fri, Apr 15, 2011 at 12:02, Tobias Burnus <bur...@net-b.de> wrote: > Dear all, > > I have question how one can and should tell the middle end about > asynchonous/single-sided memory access; the goal is to produce fast but > race-free code. All the following is about Fortran 2003 (asynchronous) and > Fortran 2008 (coarrays), but the problem itself should occur with all(?) > supported languages. It definitely occurs when using C with MPI or using C > with the asynchronous I/O functions, though I do not know in how far one > currently relys on luck. > > There are two issues, which need to be solved by informing the middle end - > either by variable attributes or by inserted function calls: > - Prohibiting some code movements > - Making assumptions about the memory content > > a) ASYNCHRONOUS attribute and asynchronous I/O > > Fortran allows asynchronous I/O, which means for the programmer that between > initiating the asynchronous reading/writing and the finishing read/write, > the variable may not be accessed (for READ) or not be changed (for WRITE). > The compiler needs to make sure that it does not move code such that this > constraint is violated. All variables involved in asynchronous operations > are marked as ASYNCHRONOUS. > > Thus, for asynchronous operations, code movements involving opaque function > calls should not happen - but contrary to VOLATILE, there is no need to take > the value all time from the memory if it is still in the register. > > Example: > > integer, ASYNCHRONOUS :: async_int > > WRITE (unit, ASYNCHRONOUS='yes') async_int > ! ... > WAIT (unit) > a = async_int > do i = 1, 10 > b(i) = async_int + 1 > end do > > Here, "a = async_int" may not be moved before the WAIT line. However, > contrary to VOLATILE, one can move the "async_int + 1" before the loop and > use the value from the registry in the loop. Note additionally that the > initiation of an asynchronous operation (WRITE statement above) is known at > compile time; however, it is not known when it ends - the > WAIT can be in a different translation unit. See also PR 25829. > > The Fortran 2008 standard is not very explicit about the ASYNCHRONOUS > attribute itself; it simply states that it is for asynchronous I/O. > (However, it describes then how async I/O works, > including WAIT, INQUIRE, and what a programmer may do until the async I/O is > finished.) The closed to an ASYNCHRONOUS definition is the non-normative > note 5.4 of Fortran 2008: > > "The ASYNCHRONOUS attribute specifies the variables that might be associated > with a pending input/output storage sequence (the actual memory locations on > which asynchronous input/output is being performed) while the scoping unit > is in execution. This information could be used by the compiler to disable > certain code motion optimizations." > > Seemingly intended, but not that clear in the F2003/F2008 standard, is to > allow for asynchronous user operations; this will presumbly refined in TR > 29113 which is currently being drafted - and/or in an interpretation > request. The main requestee for this feature is the MPI Forum, which works > on MPI3. In any case the following should work analogously and "buf" should > not be moved before the "MPI_Wait" line: > > CALL MPI_Irecv(buf, rq) > CALL MPI_Wait(rq) > xnew=buf > > Hereby, "buf" and (maybe?) the first dummy argument of MPI_Irecv have the > ASYNCHRONOUS attribute. > > My question is now: How to properly tell this the middle end? > > VOLATILE seems to be wrong as it prevents way too many optimizations and I > think it does not completely prevent code moving. Using a call to some > built-in function does not work as in principle the end of an asynchronous > operation is not known. It could end with a WAIT - possibly also wrapped in > a function, which is in a different translation unit - or also with an > INQUIRE(..., PENDING=aio_pending) if "aio_pending" gets assigned a .false. > > (Frankly, I am not 100% sure about the exact semantics of ASYNCHRONOUS; I > think might be implemented by preventing all code movements which involve > swapping an ASYNCHRONOUS variable with a function call, which is not pure. > Otherwise, in terms of the variable value, it acts like a normal variable, > i.e. if one does: "a = 7" and does not set "a" afterwards (assignment or via > function calls), it remains 7. The changing of the variable is explicit - > even if it only becomes effective with some delay.)
A compiler memory barrier in GCC is asm volatile("" ::: "memory"); That being said, does the middle end really move loads and stores past function calls? If not, a call is effectively also a compiler memory barrier, no? > B) COARRAYS > > The memory model of coarrays is that all memory is private to the image - > except for coarrays. Coarrays exists on all images. For "integer :: > coarray(:)[*]", local accesses are "coarray = ..." or "coarray(4) = ..." > while remote accesses are "coarray(:)[7] = ..." or "a = coarray(3)[2]", > where the data is set on image 7 or pulled from image 2. > > Let's start directly with an example: > > module m > integer, save :: caf_int[*] ! Global variable > end module m > > subroutine foo() > use m > caf_int = 7 ! Set local variable to 7 (effectively: on image 1 only) > SYNC ALL ! Memory barrier/fence > SYNC ALL > ! caf_int should now be 8, cf. below; thus the following if shall > ! neither optimized way not be executed at run time. > if (caf_int == 7) call abort() > end subroutine foo > > subroutine bar() > use m > SYNC ALL > caf_int[1] = 8 ! Set variable on image 1 to 8 > SYNC ALL > end subroutine bar > > program caf_example > if (this_image() == 1) CALL foo() > if (this_image() == 2) CALL bar() > end program caf_example > > Notes: > - The coarray "caf_int" will be registered in the communication library at > startup of the main program. > - For image 1 one always accesses "caf_int" in local memory. The variable > also does not alias with anything - except that the value might change via > single-sided communication. > > Thus: SYNC ALL acts as memory fence - for coarrays only. In principle, all > other variables might be moved across the fence. Besides preventing code > moves, the value of the variable cannot be assumed to be the same as before > the fence. I think a simple call to "__sync_synchronize()" (alias > BUILT_IN_SYNCHRONIZE) should take care of this, but I want to > confirm that it indeed does so. I assume I can still keep all coarrays as > restricted pointers - even though there is single-sided communication. > > Q1: Is __sync_synchronize() sufficient? I don't think this is correct. __sync_synchronize() just issues a hardware memory fence instruction. That is, it prevents loads and stores from moving past the fence *on the processor that executes the fence instruction*. There is no synchronization with other processors. SYNC ALL, OTOH is a full barrier; the implementation must be something like a counter that every co-image increments atomically, and then waits (blocking or spinning) until the counter equals the number of co-images. > Q2: Can this be optimized in some way? Probably not. For general issues with the shared-memory model, perhaps shared memory Co-arrays can piggyback on the work being done for the C++0x memory model, see http://gcc.gnu.org/wiki/Atomic/GCCMM Hans Boehm maintains a collection of links to various papers on the topic at http://www.hpl.hp.com/personal/Hans_Boehm/c++mm/ -- Janne Blomqvist