Dear all,
I have question how one can and should tell the middle end about
asynchonous/single-sided memory access; the goal is to produce fast but
race-free code. All the following is about Fortran 2003 (asynchronous)
and Fortran 2008 (coarrays), but the problem itself should occur with
all(?) supported languages. It definitely occurs when using C with MPI
or using C with the asynchronous I/O functions, though I do not know in
how far one currently relys on luck.
There are two issues, which need to be solved by informing the middle
end - either by variable attributes or by inserted function calls:
- Prohibiting some code movements
- Making assumptions about the memory content
a) ASYNCHRONOUS attribute and asynchronous I/O
Fortran allows asynchronous I/O, which means for the programmer that
between initiating the asynchronous reading/writing and the finishing
read/write, the variable may not be accessed (for READ) or not be
changed (for WRITE). The compiler needs to make sure that it does not
move code such that this constraint is violated. All variables involved
in asynchronous operations are marked as ASYNCHRONOUS.
Thus, for asynchronous operations, code movements involving opaque
function calls should not happen - but contrary to VOLATILE, there is no
need to take the value all time from the memory if it is still in the
register.
Example:
integer, ASYNCHRONOUS :: async_int
WRITE (unit, ASYNCHRONOUS='yes') async_int
! ...
WAIT (unit)
a = async_int
do i = 1, 10
b(i) = async_int + 1
end do
Here, "a = async_int" may not be moved before the WAIT line. However,
contrary to VOLATILE, one can move the "async_int + 1" before the loop
and use the value from the registry in the loop. Note additionally that
the initiation of an asynchronous operation (WRITE statement above) is
known at compile time; however, it is not known when it ends - the
WAIT can be in a different translation unit. See also PR 25829.
The Fortran 2008 standard is not very explicit about the ASYNCHRONOUS
attribute itself; it simply states that it is for asynchronous I/O.
(However, it describes then how async I/O works,
including WAIT, INQUIRE, and what a programmer may do until the async
I/O is finished.) The closed to an ASYNCHRONOUS definition is the
non-normative note 5.4 of Fortran 2008:
"The ASYNCHRONOUS attribute specifies the variables that might be
associated with a pending input/output storage sequence (the actual
memory locations on which asynchronous input/output is being performed)
while the scoping unit is in execution. This information could be used
by the compiler to disable certain code motion optimizations."
Seemingly intended, but not that clear in the F2003/F2008 standard, is
to allow for asynchronous user operations; this will presumbly refined
in TR 29113 which is currently being drafted - and/or in an
interpretation request. The main requestee for this feature is the MPI
Forum, which works on MPI3. In any case the following should work
analogously and "buf" should not be moved before the "MPI_Wait" line:
CALL MPI_Irecv(buf, rq)
CALL MPI_Wait(rq)
xnew=buf
Hereby, "buf" and (maybe?) the first dummy argument of MPI_Irecv have
the ASYNCHRONOUS attribute.
My question is now: How to properly tell this the middle end?
VOLATILE seems to be wrong as it prevents way too many optimizations and
I think it does not completely prevent code moving. Using a call to some
built-in function does not work as in principle the end of an
asynchronous operation is not known. It could end with a WAIT - possibly
also wrapped in a function, which is in a different translation unit -
or also with an INQUIRE(..., PENDING=aio_pending) if "aio_pending" gets
assigned a .false.
(Frankly, I am not 100% sure about the exact semantics of ASYNCHRONOUS;
I think might be implemented by preventing all code movements which
involve swapping an ASYNCHRONOUS variable with a function call, which is
not pure. Otherwise, in terms of the variable value, it acts like a
normal variable, i.e. if one does: "a = 7" and does not set "a"
afterwards (assignment or via function calls), it remains 7. The
changing of the variable is explicit - even if it only becomes effective
with some delay.)
B) COARRAYS
The memory model of coarrays is that all memory is private to the image
- except for coarrays. Coarrays exists on all images. For "integer ::
coarray(:)[*]", local accesses are "coarray = ..." or "coarray(4) = ..."
while remote accesses are "coarray(:)[7] = ..." or "a = coarray(3)[2]",
where the data is set on image 7 or pulled from image 2.
Let's start directly with an example:
module m
integer, save :: caf_int[*] ! Global variable
end module m
subroutine foo()
use m
caf_int = 7 ! Set local variable to 7 (effectively: on image 1 only)
SYNC ALL ! Memory barrier/fence
SYNC ALL
! caf_int should now be 8, cf. below; thus the following if shall
! neither optimized way not be executed at run time.
if (caf_int == 7) call abort()
end subroutine foo
subroutine bar()
use m
SYNC ALL
caf_int[1] = 8 ! Set variable on image 1 to 8
SYNC ALL
end subroutine bar
program caf_example
if (this_image() == 1) CALL foo()
if (this_image() == 2) CALL bar()
end program caf_example
Notes:
- The coarray "caf_int" will be registered in the communication library
at startup of the main program.
- For image 1 one always accesses "caf_int" in local memory. The
variable also does not alias with anything - except that the value might
change via single-sided communication.
Thus: SYNC ALL acts as memory fence - for coarrays only. In principle,
all other variables might be moved across the fence. Besides preventing
code moves, the value of the variable cannot be assumed to be the same
as before the fence. I think a simple call to "__sync_synchronize()"
(alias BUILT_IN_SYNCHRONIZE) should take care of this, but I want to
confirm that it indeed does so. I assume I can still keep all coarrays
as restricted pointers - even though there is single-sided communication.
Q1: Is __sync_synchronize() sufficient?
Q2: Can this be optimized in some way?
Tobias