Hi Tobias!

On 2023-07-28T13:51:41+0200, Tobias Burnus <tob...@codesourcery.com> wrote:
> On 27.07.23 23:00, Thomas Schwinge wrote:
>>> +      else if (src_devicep != NULL
>>> +            && (dst_devicep == NULL
>>> +                || (dst_devicep->capabilities
>>> +                    & GOMP_OFFLOAD_CAP_SHARED_MEM)))
>> Are these 'GOMP_OFFLOAD_CAP_SHARED_MEM' actually reachable, given that
>> 'omp_target_memcpy_check' (via 'omp_target_memcpy_rect_check') clears out
>> the device to 'NULL' for 'GOMP_OFFLOAD_CAP_SHARED_MEM'?
>
> I have now undone this change – I did not dig deep enough into the
> function calls.
>
>
>>> +      else if (dst_devicep == NULL && src_devicep == NULL)
>>> +     {
>>> +       memcpy ((char *) dst + dst_off, (const char *) src + src_off,
>>> +               length);
>>> +       ret = 1;
>>> +     }
>>>         else if (src_devicep == dst_devicep)
>>>        ret = src_devicep->dev2dev_func (src_devicep->target_id,
>>>                                         (char *) dst + dst_off,
>>>                                         (const char *) src + src_off,
>>>                                         length);
>> ..., but also left the intra-device case here -- which should now be dead
>> code here?
>
> Why? Unless I missed something, the old, the current, and the proposed
> (= old) code do still run this code.

It is now again reachable, but wasn't in the "intermediate state"
(without your follow-on change) -- at least per my understanding, which
may be gappy.

> I have not added an assert to confirm, but in any case, it is tested for
> in my recently added testcase - thus, we could add a 'printf' to confirm.

We're now back to the original code, which should be fine.

>>> +       else if (*tmp_size < length)
>>> +         {
>>> +           *tmp_size = length;
>>> +           *tmp = realloc (*tmp, length);
>>> +           if (*tmp == NULL)
>>> +             return ENOMEM;
>> If 'realloc' returns 'NULL', we should 'free' the original '*tmp'?
>>
>> Do we really need here the property here that if the re-allocation can't
>> be done in-place, 'realloc' copies the original content to the new?  In
>> other words, should we just unconditionally 'free' and re-'malloc' here,
>> instead of 'realloc'?
> I have now done so – but I am not really sure what's faster on average.
> If it can be enlarged, 'realloc' is faster, if it cannot free+malloc is
> better.

I have no proof, but would assume that the C library handles as
efficiently as 'realloc' the case of 'free' plus subsequent 'malloc' if
there's space available after the original allocation.

>> I haven't looked whether the re-use of 'tmp' for multiple calls to this
>> is then actually useful, or whether we should just always 'malloc', use,
>> 'free' the buffer here?

..., hence that suggestion.  But I agree what we've got now is good, just
one more idea: couldn't we now actually unify the (original) 'malloc' and
(original) 'realloc' case:

    -if (*tmp_size == 0)
    -  {
    -    *tmp_size = length;
    -    *tmp = malloc (length);
    -    if (*tmp == NULL)
    -      return ENOMEM;
    -  }
    -else if (*tmp_size < length)
    +if (*tmp_size < length)
       {
         *tmp_size = length;
         free (*tmp);
         *tmp = malloc (length);
         if (*tmp == NULL)
           return ENOMEM;
       }

(Untested.)

> Well, it can run in a hot loop – assume a C-array array[1024][1024][2]
> and copying array[:1024,:1024,0:1] (using OpenMP syntax) – i.e. 1048576
> times every other element. And therefore I would like to avoid repeated
> malloc/free in such a case. (But in general, interdevice copying should
> be very rare.)
>
> Actually, I think the realloc case is unreachable: for rectangular
> copies, as implied both by 'target update' with strided access and by
> 'omp_target_memcpy_rect', the size should be constant.

Worth an 'assert'?

Now I "only" don't understand the '__builtin_mul_overflow' computations
and error checks ('return EINVAL;') for the four cases handled in
'libgomp/target.c:omp_target_memcpy_rect_worker': for example, the
generic loop at end of function iterates over all 'num_dims', but the
specific earlier 'num_dims == N' cases don't.  But I suppose that's just
my own problem not understanding this API/code, and I'm not currently
planning on looking into this any further.  ;-)


Grüße
 Thomas
-----------------
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955

Reply via email to