Hi Tobias! On 2023-07-28T13:51:41+0200, Tobias Burnus <tob...@codesourcery.com> wrote: > On 27.07.23 23:00, Thomas Schwinge wrote: >>> + else if (src_devicep != NULL >>> + && (dst_devicep == NULL >>> + || (dst_devicep->capabilities >>> + & GOMP_OFFLOAD_CAP_SHARED_MEM))) >> Are these 'GOMP_OFFLOAD_CAP_SHARED_MEM' actually reachable, given that >> 'omp_target_memcpy_check' (via 'omp_target_memcpy_rect_check') clears out >> the device to 'NULL' for 'GOMP_OFFLOAD_CAP_SHARED_MEM'? > > I have now undone this change – I did not dig deep enough into the > function calls. > > >>> + else if (dst_devicep == NULL && src_devicep == NULL) >>> + { >>> + memcpy ((char *) dst + dst_off, (const char *) src + src_off, >>> + length); >>> + ret = 1; >>> + } >>> else if (src_devicep == dst_devicep) >>> ret = src_devicep->dev2dev_func (src_devicep->target_id, >>> (char *) dst + dst_off, >>> (const char *) src + src_off, >>> length); >> ..., but also left the intra-device case here -- which should now be dead >> code here? > > Why? Unless I missed something, the old, the current, and the proposed > (= old) code do still run this code.
It is now again reachable, but wasn't in the "intermediate state" (without your follow-on change) -- at least per my understanding, which may be gappy. > I have not added an assert to confirm, but in any case, it is tested for > in my recently added testcase - thus, we could add a 'printf' to confirm. We're now back to the original code, which should be fine. >>> + else if (*tmp_size < length) >>> + { >>> + *tmp_size = length; >>> + *tmp = realloc (*tmp, length); >>> + if (*tmp == NULL) >>> + return ENOMEM; >> If 'realloc' returns 'NULL', we should 'free' the original '*tmp'? >> >> Do we really need here the property here that if the re-allocation can't >> be done in-place, 'realloc' copies the original content to the new? In >> other words, should we just unconditionally 'free' and re-'malloc' here, >> instead of 'realloc'? > I have now done so – but I am not really sure what's faster on average. > If it can be enlarged, 'realloc' is faster, if it cannot free+malloc is > better. I have no proof, but would assume that the C library handles as efficiently as 'realloc' the case of 'free' plus subsequent 'malloc' if there's space available after the original allocation. >> I haven't looked whether the re-use of 'tmp' for multiple calls to this >> is then actually useful, or whether we should just always 'malloc', use, >> 'free' the buffer here? ..., hence that suggestion. But I agree what we've got now is good, just one more idea: couldn't we now actually unify the (original) 'malloc' and (original) 'realloc' case: -if (*tmp_size == 0) - { - *tmp_size = length; - *tmp = malloc (length); - if (*tmp == NULL) - return ENOMEM; - } -else if (*tmp_size < length) +if (*tmp_size < length) { *tmp_size = length; free (*tmp); *tmp = malloc (length); if (*tmp == NULL) return ENOMEM; } (Untested.) > Well, it can run in a hot loop – assume a C-array array[1024][1024][2] > and copying array[:1024,:1024,0:1] (using OpenMP syntax) – i.e. 1048576 > times every other element. And therefore I would like to avoid repeated > malloc/free in such a case. (But in general, interdevice copying should > be very rare.) > > Actually, I think the realloc case is unreachable: for rectangular > copies, as implied both by 'target update' with strided access and by > 'omp_target_memcpy_rect', the size should be constant. Worth an 'assert'? Now I "only" don't understand the '__builtin_mul_overflow' computations and error checks ('return EINVAL;') for the four cases handled in 'libgomp/target.c:omp_target_memcpy_rect_worker': for example, the generic loop at end of function iterates over all 'num_dims', but the specific earlier 'num_dims == N' cases don't. But I suppose that's just my own problem not understanding this API/code, and I'm not currently planning on looking into this any further. ;-) Grüße Thomas ----------------- Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht München, HRB 106955