RFC: Telling the middle end about asynchronous/single-sided memory access (Fortran related)

Tobias Burnus Fri, 15 Apr 2011 02:02:54 -0700

Dear all,

I have question how one can and should tell the middle end aboutasynchonous/single-sided memory access; the goal is to produce fast butrace-free code. All the following is about Fortran 2003 (asynchronous)and Fortran 2008 (coarrays), but the problem itself should occur withall(?) supported languages. It definitely occurs when using C with MPIor using C with the asynchronous I/O functions, though I do not know inhow far one currently relys on luck.

There are two issues, which need to be solved by informing the middleend - either by variable attributes or by inserted function calls:

- Prohibiting some code movements
- Making assumptions about the memory content

a) ASYNCHRONOUS attribute and asynchronous I/O

Fortran allows asynchronous I/O, which means for the programmer thatbetween initiating the asynchronous reading/writing and the finishingread/write, the variable may not be accessed (for READ) or not bechanged (for WRITE). The compiler needs to make sure that it does notmove code such that this constraint is violated. All variables involvedin asynchronous operations are marked as ASYNCHRONOUS.

Thus, for asynchronous operations, code movements involving opaquefunction calls should not happen - but contrary to VOLATILE, there is noneed to take the value all time from the memory if it is still in theregister.


Example:

   integer, ASYNCHRONOUS :: async_int

   WRITE (unit, ASYNCHRONOUS='yes') async_int
   ! ...
   WAIT (unit)
   a = async_int
   do i = 1, 10
     b(i) = async_int + 1
   end do

Here, "a = async_int" may not be moved before the WAIT line. However,contrary to VOLATILE, one can move the "async_int + 1" before the loopand use the value from the registry in the loop. Note additionally thatthe initiation of an asynchronous operation (WRITE statement above) isknown at compile time; however, it is not known when it ends - the

WAIT can be in a different translation unit. See also PR 25829.

The Fortran 2008 standard is not very explicit about the ASYNCHRONOUSattribute itself; it simply states that it is for asynchronous I/O.(However, it describes then how async I/O works,including WAIT, INQUIRE, and what a programmer may do until the asyncI/O is finished.) The closed to an ASYNCHRONOUS definition is thenon-normative note 5.4 of Fortran 2008:

"The ASYNCHRONOUS attribute specifies the variables that might beassociated with a pending input/output storage sequence (the actualmemory locations on which asynchronous input/output is being performed)while the scoping unit is in execution. This information could be usedby the compiler to disable certain code motion optimizations."

Seemingly intended, but not that clear in the F2003/F2008 standard, isto allow for asynchronous user operations; this will presumbly refinedin TR 29113 which is currently being drafted - and/or in aninterpretation request. The main requestee for this feature is the MPIForum, which works on MPI3. In any case the following should workanalogously and "buf" should not be moved before the "MPI_Wait" line:


  CALL MPI_Irecv(buf, rq)
  CALL MPI_Wait(rq)
  xnew=buf

Hereby, "buf" and (maybe?) the first dummy argument of MPI_Irecv havethe ASYNCHRONOUS attribute.


My question is now: How to properly tell this the middle end?

VOLATILE seems to be wrong as it prevents way too many optimizations andI think it does not completely prevent code moving. Using a call to somebuilt-in function does not work as in principle the end of anasynchronous operation is not known. It could end with a WAIT - possiblyalso wrapped in a function, which is in a different translation unit -or also with an INQUIRE(..., PENDING=aio_pending) if "aio_pending" getsassigned a .false.

(Frankly, I am not 100% sure about the exact semantics of ASYNCHRONOUS;I think might be implemented by preventing all code movements whichinvolve swapping an ASYNCHRONOUS variable with a function call, which isnot pure. Otherwise, in terms of the variable value, it acts like anormal variable, i.e. if one does: "a = 7" and does not set "a"afterwards (assignment or via function calls), it remains 7. Thechanging of the variable is explicit - even if it only becomes effectivewith some delay.)



B) COARRAYS

The memory model of coarrays is that all memory is private to the image- except for coarrays. Coarrays exists on all images. For "integer ::coarray(:)[*]", local accesses are "coarray = ..." or "coarray(4) = ..."while remote accesses are "coarray(:)[7] = ..." or "a = coarray(3)[2]",where the data is set on image 7 or pulled from image 2.


Let's start directly with an example:

   module m
     integer, save :: caf_int[*]  ! Global variable
   end module m

   subroutine foo()
     use m
     caf_int = 7  ! Set local variable to 7 (effectively: on image 1 only)
     SYNC ALL ! Memory barrier/fence
     SYNC ALL
     ! caf_int should now be 8, cf. below; thus the following if shall
     ! neither optimized way not be executed at run time.
     if (caf_int == 7) call abort()
   end subroutine foo

   subroutine bar()
     use m
     SYNC ALL
     caf_int[1] = 8 ! Set variable on image 1 to 8
     SYNC ALL
   end subroutine bar

   program caf_example
     if (this_image() == 1) CALL foo()
     if (this_image() == 2) CALL bar()
   end program caf_example

Notes:

- The coarray "caf_int" will be registered in the communication libraryat startup of the main program.- For image 1 one always accesses "caf_int" in local memory. Thevariable also does not alias with anything - except that the value mightchange via single-sided communication.

Thus: SYNC ALL acts as memory fence - for coarrays only. In principle,all other variables might be moved across the fence. Besides preventingcode moves, the value of the variable cannot be assumed to be the sameas before the fence. I think a simple call to "__sync_synchronize()"(alias BUILT_IN_SYNCHRONIZE) should take care of this, but I want toconfirm that it indeed does so. I assume I can still keep all coarraysas restricted pointers - even though there is single-sided communication.


Q1: Is __sync_synchronize() sufficient?
Q2: Can this be optimized in some way?

Tobias

RFC: Telling the middle end about asynchronous/single-sided memory access (Fortran related)

Reply via email to