On Fri, Apr 15, 2011 at 11:52 AM, Janne Blomqvist <blomqvist.ja...@gmail.com> wrote: > On Fri, Apr 15, 2011 at 12:02, Tobias Burnus <bur...@net-b.de> wrote: >> Dear all, >> >> I have question how one can and should tell the middle end about >> asynchonous/single-sided memory access; the goal is to produce fast but >> race-free code. All the following is about Fortran 2003 (asynchronous) and >> Fortran 2008 (coarrays), but the problem itself should occur with all(?) >> supported languages. It definitely occurs when using C with MPI or using C >> with the asynchronous I/O functions, though I do not know in how far one >> currently relys on luck. >> >> There are two issues, which need to be solved by informing the middle end - >> either by variable attributes or by inserted function calls: >> - Prohibiting some code movements >> - Making assumptions about the memory content >> >> a) ASYNCHRONOUS attribute and asynchronous I/O >> >> Fortran allows asynchronous I/O, which means for the programmer that between >> initiating the asynchronous reading/writing and the finishing read/write, >> the variable may not be accessed (for READ) or not be changed (for WRITE). >> The compiler needs to make sure that it does not move code such that this >> constraint is violated. All variables involved in asynchronous operations >> are marked as ASYNCHRONOUS. >> >> Thus, for asynchronous operations, code movements involving opaque function >> calls should not happen - but contrary to VOLATILE, there is no need to take >> the value all time from the memory if it is still in the register. >> >> Example: >> >> integer, ASYNCHRONOUS :: async_int >> >> WRITE (unit, ASYNCHRONOUS='yes') async_int >> ! ... >> WAIT (unit) >> a = async_int >> do i = 1, 10 >> b(i) = async_int + 1 >> end do >> >> Here, "a = async_int" may not be moved before the WAIT line. However, >> contrary to VOLATILE, one can move the "async_int + 1" before the loop and >> use the value from the registry in the loop. Note additionally that the >> initiation of an asynchronous operation (WRITE statement above) is known at >> compile time; however, it is not known when it ends - the >> WAIT can be in a different translation unit. See also PR 25829. >> >> The Fortran 2008 standard is not very explicit about the ASYNCHRONOUS >> attribute itself; it simply states that it is for asynchronous I/O. >> (However, it describes then how async I/O works, >> including WAIT, INQUIRE, and what a programmer may do until the async I/O is >> finished.) The closed to an ASYNCHRONOUS definition is the non-normative >> note 5.4 of Fortran 2008: >> >> "The ASYNCHRONOUS attribute specifies the variables that might be associated >> with a pending input/output storage sequence (the actual memory locations on >> which asynchronous input/output is being performed) while the scoping unit >> is in execution. This information could be used by the compiler to disable >> certain code motion optimizations." >> >> Seemingly intended, but not that clear in the F2003/F2008 standard, is to >> allow for asynchronous user operations; this will presumbly refined in TR >> 29113 which is currently being drafted - and/or in an interpretation >> request. The main requestee for this feature is the MPI Forum, which works >> on MPI3. In any case the following should work analogously and "buf" should >> not be moved before the "MPI_Wait" line: >> >> CALL MPI_Irecv(buf, rq) >> CALL MPI_Wait(rq) >> xnew=buf >> >> Hereby, "buf" and (maybe?) the first dummy argument of MPI_Irecv have the >> ASYNCHRONOUS attribute. >> >> My question is now: How to properly tell this the middle end? >> >> VOLATILE seems to be wrong as it prevents way too many optimizations and I >> think it does not completely prevent code moving. Using a call to some >> built-in function does not work as in principle the end of an asynchronous >> operation is not known. It could end with a WAIT - possibly also wrapped in >> a function, which is in a different translation unit - or also with an >> INQUIRE(..., PENDING=aio_pending) if "aio_pending" gets assigned a .false. >> >> (Frankly, I am not 100% sure about the exact semantics of ASYNCHRONOUS; I >> think might be implemented by preventing all code movements which involve >> swapping an ASYNCHRONOUS variable with a function call, which is not pure. >> Otherwise, in terms of the variable value, it acts like a normal variable, >> i.e. if one does: "a = 7" and does not set "a" afterwards (assignment or via >> function calls), it remains 7. The changing of the variable is explicit - >> even if it only becomes effective with some delay.) > > A compiler memory barrier in GCC is > > asm volatile("" ::: "memory"); > > That being said, does the middle end really move loads and stores past > function calls? If not, a call is effectively also a compiler memory > barrier, no?
Yes, a function call is also a compiler memory barrier for all memory that the function can possibly access. There is currently no easy way to restrict the barrierness to a certain kind of memory (like memory marked with ASYNCHRONOUS), but it will simply act as a barrier for all escaped memory (and variables for I/O escape as they are passed by reference to the I/O functions). >> B) COARRAYS >> >> The memory model of coarrays is that all memory is private to the image - >> except for coarrays. Coarrays exists on all images. For "integer :: >> coarray(:)[*]", local accesses are "coarray = ..." or "coarray(4) = ..." >> while remote accesses are "coarray(:)[7] = ..." or "a = coarray(3)[2]", >> where the data is set on image 7 or pulled from image 2. >> >> Let's start directly with an example: >> >> module m >> integer, save :: caf_int[*] ! Global variable >> end module m >> >> subroutine foo() >> use m >> caf_int = 7 ! Set local variable to 7 (effectively: on image 1 only) >> SYNC ALL ! Memory barrier/fence >> SYNC ALL >> ! caf_int should now be 8, cf. below; thus the following if shall >> ! neither optimized way not be executed at run time. >> if (caf_int == 7) call abort() >> end subroutine foo >> >> subroutine bar() >> use m >> SYNC ALL >> caf_int[1] = 8 ! Set variable on image 1 to 8 >> SYNC ALL >> end subroutine bar >> >> program caf_example >> if (this_image() == 1) CALL foo() >> if (this_image() == 2) CALL bar() >> end program caf_example >> >> Notes: >> - The coarray "caf_int" will be registered in the communication library at >> startup of the main program. >> - For image 1 one always accesses "caf_int" in local memory. The variable >> also does not alias with anything - except that the value might change via >> single-sided communication. >> >> Thus: SYNC ALL acts as memory fence - for coarrays only. In principle, all >> other variables might be moved across the fence. Besides preventing code >> moves, the value of the variable cannot be assumed to be the same as before >> the fence. I think a simple call to "__sync_synchronize()" (alias >> BUILT_IN_SYNCHRONIZE) should take care of this, but I want to >> confirm that it indeed does so. I assume I can still keep all coarrays as >> restricted pointers - even though there is single-sided communication. >> >> Q1: Is __sync_synchronize() sufficient? > > I don't think this is correct. __sync_synchronize() just issues a > hardware memory fence instruction. That is, it prevents loads and > stores from moving past the fence *on the processor that executes the > fence instruction*. There is no synchronization with other > processors. SYNC ALL, OTOH is a full barrier; the implementation must > be something like a counter that every co-image increments atomically, > and then waits (blocking or spinning) until the counter equals the > number of co-images. Indeed, it looks like you need something heavier, like a mutex. You can probably piggy-back on the OpenMP support for barriers. >> Q2: Can this be optimized in some way? For simple types you could use atomic instructions for the modification itself instead of two SYNC ALL calls. > Probably not. For general issues with the shared-memory model, perhaps > shared memory Co-arrays can piggyback on the work being done for the > C++0x memory model, see > > http://gcc.gnu.org/wiki/Atomic/GCCMM > > Hans Boehm maintains a collection of links to various papers on the topic at > > http://www.hpl.hp.com/personal/Hans_Boehm/c++mm/ > > > -- > Janne Blomqvist >