Hi! On 2021-02-09T13:05:22+0000, Julian Brown <jul...@codesourcery.com> wrote: > On Tue, 9 Feb 2021 13:45:36 +0100 > Tobias Burnus <tob...@codesourcery.com> wrote: > >> On 09.02.21 12:58, Thomas Schwinge wrote: >> >> Granted. The array(:)%re access might update too much, but that's >> >> not different to array with strides or with contiguous arrays >> >> sections which contain component reference (and more than one >> >> component). >> > (Is that indeed allowed to "update too much"?) >> >> Yes - that's the general problem with strides or bit sets; >> copying only a subset – and doing so atomically – is not >> always possible or feasible. >> >> *OpenACC* 3.1 has for "2.14.4 Update Directive" the restriction: >> >> "Noncontiguous subarrays may appear. It is implementation-specific >> whether noncontiguous regions are updated by using one transfer >> for each contiguous subregion, or whether the noncontiguous data >> is packed, transferred once, and unpacked, or whether one or more >> larger subarrays (no larger than the smallest contiguous region >> that contains the specified subarray) are updated." >> >> For map, I saw that that's the case – but I think Julian's >> patch does not handle this correctly for: >> >> type t >> integer :: i, j, k >> end type t >> type(t) :: A(100) >> ... host(A(:)%j)
So I understand the variants in the quoted OpenACC part as follows. However I don't claim that I necessarily got all the fine detail right at the language level (English as well as OpenACC)! So, please verify. "Using one transfer for each contiguous subregion": for n in 1..100: transfer A(n)%j individually "Noncontiguous data is packed, transferred once, and unpacked": device: integer :: tmp(100) for n in 1..100: tmp(n) = A(n)%j transfer tmp host: for n in 1..100: A(n)%j = tmp(n) In this example here, I understand "subarrays (no larger than the smallest contiguous region that contains the specified subarray)" to again resolve to 'A(n)%j', so doesn't add other another variant. This -- per my reading! -- would be different here: type t integer :: i, j1, j2, k end type t type(t) :: A(100) ... host(A(:)%j1, A(:)%j2) ... where I understand this to mean that each 'A(n)%j1' plus 'A(:)%j2' may be transferred together: either "using one transfer for each contiguous subregion": for n in 1..100: transfer memory region of A(n)%j1..A(n)%j2 individually ..., or "packed, transferred once, and unpacked": device: integer :: tmp(2 * 100) for n in 1..100: tmp(2 * n) = A(n)%j1 tmp(2 * n + 1) = A(n)%j2 transfer tmp host: for n in 1..100: A(n)%j1 = tmp(2 * n) A(n)%j2 = tmp(2 * n + 1) I do however not read into this that the following would be valid: >> I think instead of transferring A(1)%j to A(100)%j, it transfers >> all of A(:), i.e. also A(1)%i and A(100)%k :-( I don't think it's appropriate for an 'update' to alter anything else than the exact 'var's as specified. > Yes it will -- but given that A(2)%i and A(99)%k (and all the in-between > values) can legitimately be transferred according to the spec I don't read it that way, I'm afraid. :-O > how much > of a problem is that? In particular, are there situations where this > "over-updating" can lead to incorrect results in a conforming program? In your reading indeed it wouldn't, because the user couldn't expect the following: > Perhaps the question is, can a user legitimately expect the host and > offloaded versions of some given memory block to hold different data, > like maintaining different data in a cache than the storage backing > that cache? One use-case for that might be double buffering a "single > array" (i.e. the host and device versions of that array). I don't think > that's something we'd want to encourage, though. I find the wording in the spec rather explicitly, for example: OpenACC 3.1, 2.7 "Data Clauses": | In all cases, the compiler will allocate and manage a copy of the 'var' | in the memory of the current device, creating a *visible device copy* | of that 'var', for data not in shared memory. Emphasis mine; and I indeed understand this to mean that the user can "legitimately expect the host and offloaded versions of some given memory block to hold different data, [...]" (your words from above). > I think, rather, that partial updates are an optimisation the user can > use when they know that only part of an array has been updated, so > slight over-copying is harmless. Interesting -- and, again: that makes sense in your reading. So. Should everybody work through this again, trying to reach consensus? Do we need to clarify that with OpenACC? As for OpenMP, Tobias stated: |On 2021-02-09T13:45:36+0100, Tobias Burnus <tob...@codesourcery.com> wrote: |> For OpenMP and map, I recall encountering a code which did do |> this for OpenMP (i.e. contiguous subsection). I think it was |> related to derived-type 'map', but I do not recall anymore. |> |> |> Looking at the *OpenMP* 5.1 spec, I see that 'target update' also |> allows: "The list items that appear in the to or from clauses |> may include array sections with stride expressions." |> While for the map clause, there is: |> 'If a list item is an array section, it must specify contiguous |> storage.' (I suppose we all agree on that.) |> But I did not see a more explicit description how that should be |> handled, contrary to the rather explicit description for OpenACC. Surprising. Can some relevant wording (like OpenACC's "visible device copy") be found elsewhere (in the hundreds of pages...)? By default, I would've assumed "defensive", so again: "I don't think it's appropriate for an 'update' to alter anything else than the exact 'var's as specified." (my words from above)? Grüße Thomas ----------------- Mentor Graphics (Deutschland) GmbH, Arnulfstrasse 201, 80634 München Registergericht München HRB 106955, Geschäftsführer: Thomas Heurung, Frank Thürauf