Hi Kwok!

On 2019-06-24T20:37:12+0100, Kwok Cheung Yeung <k...@codesourcery.com> wrote:
> On 17/06/2019 6:24 pm, Thomas Schwinge wrote:
>>> Okay to push to openacc-gcc-9-branch?

>> I think what would be best, is the following approach: [...]

> I have now ported over the mainline patch over to OG9, plus an 
> additional patch on top of that to bring in the bits from OG8 that did 
> not make it upstream.
> 
> I have dropped the differences in comments, TODOs, documentation etc. in 
> favour of the upstream patch. There are also various places where the 
> OG8 patch sets up profiling, then gotos the end of the function to tear 
> it down again, whereas the mainline version just aborts early without 
> setting up profiling in the first place - these I have also resolved in 
> favour of the mainline version.

ACK, thanks!

> I have rerun the libgomp testsuite with no regressions noted.

(I didn't actually test, but just reviewed the code changes.)

> Okay to push to openacc-gcc-9-branch?

Yes, please push after re-testing with minor modifications in the second
commit, as follows:

> From 341285e282f5b7ed73daaeb9fd2f820dc1fe91f9 Mon Sep 17 00:00:00 2001
> From: Kwok Cheung Yeung <k...@codesourcery.com>
> Date: Fri, 21 Jun 2019 10:40:38 -0700
> Subject: [PATCH 2/2] Add changes to profiling interface from OG8 branch
>
> This bundles up the parts of the profiling code from the OG8 branch that were
> not included in the upstream patch.

> +2017-02-28  Thomas Schwinge  <tho...@codesourcery.com>
> +
> +     [...]
> +     * oacc-parallel.c (GOACC_parallel_keyed_internal): Set device_api for
> +     profiling.

> --- a/libgomp/oacc-parallel.c
> +++ b/libgomp/oacc-parallel.c
> @@ -275,6 +275,8 @@ GOACC_parallel_keyed_internal (int flags_m, int params, 
> void (*fn) (void *),
>        goacc_call_host_fn (fn, mapnum, hostaddrs, params);
>        goto out_prof;
>      }
> +  else if (profiling_p)
> +    api_info.device_api = acc_device_api_cuda;

That change is not quite right, and I'm pretty sure it wasn't me who
introduced that code ;-P -- but that can be resolved later.

> --- a/libgomp/Makefile.in
> +++ b/libgomp/Makefile.in
> @@ -524,7 +525,7 @@ search_path = $(addprefix $(top_srcdir)/config/, 
> $(config_path)) $(top_srcdir) \
>  
>  fincludedir = 
> $(libdir)/gcc/$(target_alias)/$(gcc_version)$(MULTISUBDIR)/finclude
>  libsubincludedir = $(libdir)/gcc/$(target_alias)/$(gcc_version)/include
> -libgomp_la_LIBADD = $(LIBFFI)
> +@USE_LIBFFI_TRUE@libgomp_la_LIBADD = $(LIBFFI)

That must've been an earlier commit not properly regenerating the file,
but we shall not bother with that now.

> diff --git a/libgomp/config/nvptx/oacc-profiling-acc_register_library.c 
> b/libgomp/config/nvptx/oacc-profiling-acc_register_library.c
> new file mode 100644
> index 0000000..e69de29
> diff --git a/libgomp/config/nvptx/oacc-profiling.c 
> b/libgomp/config/nvptx/oacc-profiling.c
> new file mode 100644
> index 0000000..e69de29

Not sure if these empty files are actually needed, but no harm done, so
that can be resolved later.

> --- a/libgomp/testsuite/libgomp.oacc-c-c++-common/acc_prof-kernels-1.c
> +++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/acc_prof-kernels-1.c
> @@ -41,6 +41,7 @@ static int state = -1;
>  static acc_device_t acc_device_type;
>  static int acc_device_num;
>  static int num_gangs, num_workers, vector_length;
> +static int async;

All these 'async' changes in this file logically belong into the
respective commit of the OpenACC 'kernels' changes.  It's not a problem
to have them included here; we shall just try to remember to include them
in the OpenACC 'kernels' trunk changes.  (I've made a note; no need for
you to re-test/re-post.)

> @@ -165,6 +166,15 @@ int main()
>        for (int i = 0; i < N; ++i)
>       x[i] = i * i;
>      }
> +#ifdef __OPTIMIZE__
> +    /* TODO.  With -O2 optimizations enabled, the compiler believes that here
> +       "state == 0" still holds.  It's not yet clear what's going on.
> +       Mis-optimization across the GOMP function call boundary?  Per its
> +       gcc/omp-builtins.def definition, BUILT_IN_GOACC_PARALLEL
> +       "GOACC_parallel_keyed" doesn't have a "leaf" attribute, so the 
> compiler
> +       must expect calls back into this compilation unit?  */
> +    asm volatile ("" : : : "memory");
> +#endif

That workaround is no longer needed given the more specific workaround
that I've added, marked with "TODO PR90488".

> @@ -184,12 +196,22 @@ int main()
>        for (int i = 0; i < N; ++i)
>       x[i] = i * i;
>      }
> +#ifdef __OPTIMIZE__
> +    /* TODO.  With -O2 optimizations enabled, the compiler believes that here
> +       "state == 0" still holds.  It's not yet clear what's going on.
> +       Mis-optimization across the GOMP function call boundary?  Per its
> +       gcc/omp-builtins.def definition, BUILT_IN_GOACC_PARALLEL
> +       "GOACC_parallel_keyed" doesn't have a "leaf" attribute, so the 
> compiler
> +       must expect calls back into this compilation unit?  */
> +    asm volatile ("" : : : "memory");
> +#endif

Likewise.

> @@ -209,12 +233,22 @@ int main()
>        for (int i = 0; i < N; ++i)
>       x[i] = i * i;
>      }
> +#ifdef __OPTIMIZE__
> +    /* TODO.  With -O2 optimizations enabled, the compiler believes that here
> +       "state == 0" still holds.  It's not yet clear what's going on.
> +       Mis-optimization across the GOMP function call boundary?  Per its
> +       gcc/omp-builtins.def definition, BUILT_IN_GOACC_PARALLEL
> +       "GOACC_parallel_keyed" doesn't have a "leaf" attribute, so the 
> compiler
> +       must expect calls back into this compilation unit?  */
> +    asm volatile ("" : : : "memory");
> +#endif

Likewise.

> --- a/libgomp/testsuite/libgomp.oacc-c-c++-common/acc_prof-parallel-1.c
> +++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/acc_prof-parallel-1.c
> @@ -697,6 +695,15 @@ int main()
>      }
>      assert (state_init == 4);
>    }
> +#ifdef __OPTIMIZE__
> +  /* TODO.  With -O2 optimizations enabled, the compiler believes that here
> +     "state == 0" still holds.  It's not yet clear what's going on.
> +     Mis-optimization across the GOMP function call boundary?  Per its
> +     gcc/omp-builtins.def definition, BUILT_IN_GOACC_PARALLEL
> +     "GOACC_parallel_keyed" doesn't have a "leaf" attribute, so the compiler
> +     must expect calls back into this compilation unit?  */
> +  asm volatile ("" : : : "memory");
> +#endif

Likewise.

In 'libgomp.oacc-c-c++-common/acc_prof-version-1.c:main', we should also
remove the explicit call to 'acc_register_library'.


Grüße
 Thomas

Attachment: signature.asc
Description: PGP signature

Reply via email to