Hi!

On 2021-02-09T13:05:22+0000, Julian Brown <jul...@codesourcery.com> wrote:
> On Tue, 9 Feb 2021 13:45:36 +0100
> Tobias Burnus <tob...@codesourcery.com> wrote:
>
>> On 09.02.21 12:58, Thomas Schwinge wrote:
>> >> Granted. The array(:)%re access might update too much, but that's
>> >> not different to array with strides or with contiguous arrays
>> >> sections which contain component reference (and more than one
>> >> component).
>> > (Is that indeed allowed to "update too much"?)
>>
>> Yes - that's the general problem with strides or bit sets;
>> copying only a subset – and doing so atomically – is not
>> always possible or feasible.
>>
>> *OpenACC* 3.1 has for "2.14.4  Update Directive" the restriction:
>>
>> "Noncontiguous subarrays may appear. It is implementation-specific
>>   whether noncontiguous regions are updated by using one transfer
>>   for each contiguous subregion, or whether the noncontiguous data
>>   is packed, transferred once, and unpacked, or whether one or more
>>   larger subarrays (no larger than the smallest contiguous region
>>   that contains the specified subarray) are updated."
>>
>> For map, I saw that that's the case – but I think Julian's
>> patch does not handle this correctly for:
>>
>> type t
>>    integer :: i, j, k
>> end type t
>> type(t) :: A(100)
>>    ... host(A(:)%j)

So I understand the variants in the quoted OpenACC part as follows.
However I don't claim that I necessarily got all the fine detail right at
the language level (English as well as OpenACC)!  So, please verify.

"Using one transfer for each contiguous subregion":

    for n in 1..100: transfer A(n)%j individually

"Noncontiguous data is packed, transferred once, and unpacked":

    device:
      integer :: tmp(100)
      for n in 1..100: tmp(n) = A(n)%j
    transfer tmp
    host:
      for n in 1..100: A(n)%j = tmp(n)

In this example here, I understand "subarrays (no larger than the
smallest contiguous region that contains the specified subarray)" to
again resolve to 'A(n)%j', so doesn't add other another variant.

This -- per my reading! -- would be different here:

    type t
       integer :: i, j1, j2, k
    end type t
    type(t) :: A(100)
       ... host(A(:)%j1, A(:)%j2)

... where I understand this to mean that each 'A(n)%j1' plus 'A(:)%j2'
may be transferred together: either "using one transfer for each
contiguous subregion":

    for n in 1..100: transfer memory region of A(n)%j1..A(n)%j2 individually

..., or "packed, transferred once, and unpacked":

    device:
      integer :: tmp(2 * 100)
      for n in 1..100:
        tmp(2 * n) = A(n)%j1
        tmp(2 * n + 1) = A(n)%j2
    transfer tmp
    host:
      for n in 1..100:
        A(n)%j1 = tmp(2 * n)
        A(n)%j2 = tmp(2 * n + 1)

I do however not read into this that the following would be valid:

>> I think instead of transferring A(1)%j to A(100)%j, it transfers
>> all of A(:), i.e. also A(1)%i and A(100)%k :-(

I don't think it's appropriate for an 'update' to alter anything else
than the exact 'var's as specified.

> Yes it will -- but given that A(2)%i and A(99)%k (and all the in-between
> values) can legitimately be transferred according to the spec

I don't read it that way, I'm afraid.  :-O

> how much
> of a problem is that? In particular, are there situations where this
> "over-updating" can lead to incorrect results in a conforming program?

In your reading indeed it wouldn't, because the user couldn't expect the
following:

> Perhaps the question is, can a user legitimately expect the host and
> offloaded versions of some given memory block to hold different data,
> like maintaining different data in a cache than the storage backing
> that cache? One use-case for that might be double buffering a "single
> array" (i.e. the host and device versions of that array). I don't think
> that's something we'd want to encourage, though.

I find the wording in the spec rather explicitly, for example: OpenACC
3.1, 2.7 "Data Clauses":

| In all cases, the compiler will allocate and manage a copy of the 'var'
| in the memory of the current device, creating a *visible device copy*
| of that 'var', for data not in shared memory.

Emphasis mine; and I indeed understand this to mean that the user can
"legitimately expect the host and offloaded versions of some given memory
block to hold different data, [...]" (your words from above).

> I think, rather, that partial updates are an optimisation the user can
> use when they know that only part of an array has been updated, so
> slight over-copying is harmless.

Interesting -- and, again: that makes sense in your reading.


So.  Should everybody work through this again, trying to reach consensus?
Do we need to clarify that with OpenACC?


As for OpenMP, Tobias stated:

|On 2021-02-09T13:45:36+0100, Tobias Burnus <tob...@codesourcery.com> wrote:
|> For OpenMP and map, I recall encountering a code which did do
|> this for OpenMP (i.e. contiguous subsection). I think it was
|> related to derived-type 'map', but I do not recall anymore.
|>
|>
|> Looking at the *OpenMP* 5.1 spec, I see that 'target update' also
|> allows: "The list items that appear in the to or from clauses
|> may include array sections with stride expressions."
|> While for the map clause, there is:
|> 'If a list item is an array section, it must specify contiguous
|> storage.'

(I suppose we all agree on that.)

|> But I did not see a more explicit description how that should be
|> handled, contrary to the rather explicit description for OpenACC.

Surprising.  Can some relevant wording (like OpenACC's "visible device
copy") be found elsewhere (in the hundreds of pages...)?

By default, I would've assumed "defensive", so again: "I don't think it's
appropriate for an 'update' to alter anything else than the exact 'var's
as specified." (my words from above)?


Grüße
 Thomas
-----------------
Mentor Graphics (Deutschland) GmbH, Arnulfstrasse 201, 80634 München 
Registergericht München HRB 106955, Geschäftsführer: Thomas Heurung, Frank 
Thürauf

Reply via email to