Hi Josef, hi all,
Josef Melcr wrote:
this is the fifth version of this patch
(v4: https://gcc.gnu.org/pipermail/gcc-patches/2025-October/696559.html).
If I understand the patch correctly, it is currently for GCC internal
use, only, i.e. users that want to use the attribute in their own code
have to wait for follow up patch. (There is a space in the attribute name.)
Just because the naming was discussed at the Cauldron.
[That's perfectly fine as a start; it additionally implies that
changes can be done without having to worry about breaking
API/ABI compatibility.]
* * *
- Added a comment to ECF_CB definitions stating that they should be
used for builtin construction only.
Namely, currently, that's respectively, callback(1,2) for OpenMP
and callback(2,4) for OpenACC GOMP runtime functions.
[I think that's fine – and can be changed as needed.]
* * *
Similar to OpenMP's 'target', OpenACC's 'kernels' and 'parallel'
permit execution of code on the offload side. (They also permit
parallel execution on the host - but GCC only implements a
simple host fallback.)
As those permit offloading, the same as previously mentioned
for 'target' is true: In order that the offload function is
found, GOMP_ call needs to receive a function pointer that
is in the offload_funcs array (and, in particular, the host
and device version need to be in sync).
Thus the same care as for OpenMP's target needs to be done for:
----------------------------
#include <stdio.h>
#include <openacc.h>
int main()
{
int on_host;
int x = 3;
#pragma acc parallel copyout(on_host) copyin(x)
on_host = acc_on_device (acc_device_host) + x;
printf ("OpenACC 'parallel' was executed on %s\n",
on_host != 3 ? "the HOST" : "an ACCELERATOR DEVICE");
}
----------------------------
However, as the created function has the 'noclone' attribute,
there seems to be no issue with the v5 code. The optimized
dump looks as follows:
__attribute__((oacc function (1, 1, 1), oacc parallel, omp target entrypoint,
noclone))
void main._omp_fn.0 (const struct .omp_data_t.4 & restrict .omp_data_i)
which is called as:
__builtin_GOACC_parallel_keyed (-1, main._omp_fn.0, 2, &.omp_data_arr.5,
&.omp_data_sizes.6, &.omp_data_kinds.7, 0);
Background for the topic above is:
==============================
--- a/gcc/omp-builtins.def
+++ b/gcc/omp-builtins.def
@@ -42,7 +42,7 @@ DEF_GOACC_BUILTIN (BUILT_IN_GOACC_EXIT_DATA,
"GOACC_exit_data",
ATTR_NOTHROW_LIST)
DEF_GOACC_BUILTIN (BUILT_IN_GOACC_PARALLEL, "GOACC_parallel_keyed",
BT_FN_VOID_INT_OMPFN_SIZE_PTR_PTR_PTR_VAR,
- ATTR_NOTHROW_LIST)
+ ATTR_CALLBACK_OACC_LIST)
==============================
Thus, I wonder whether this should be skipped - and handled
in lock step with the OpenMP/omp target support.
* * *
The patch has been approved by Martin and Honza if some changes were
done, those were implemented here and in v4. Honza mentioned that
Jakub, Richi or Fortran reviewers might want comment on my addition
of some new ECF_CB constants, hence the CC.
I don't see an issue for Fortran at all – the Fortran part seems
to be only related to OpenMP/OpenACC and is rather non-invasive.
Besides user/external library code that might implement callbacks,
on the (pure) (g)Fortran side, function pointers pop up as:
- For Coarrays, I see that function points are used, but not
as a callback (→ accessor_hash_t).
- For matmul, it used to choose between the 'vanilla' and the
AVX/FMA version.
- for contained procedures with polymorphic variables.
None of those should be affected by this patch. I could imagine
some callback use (Fortran deep mapping of polymorphic variables
with OpenMP, polymorphic reductions with coarrays), but those
are presumably hard to optimize, not for hot code, and currently
not implemented.
* * *
Side remark: For Fortran comments, the fortran@ should be CCed,
but, as mentioned, I don't see an issue for gfortran at all -
and, hence, haven't CC'ed fortran@ in this email.
* * *
Thanks again for your patch! And to Honza/Martin for the review!
Tobias