https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90591
Thomas Schwinge <tschwinge at gcc dot gnu.org> changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |rguenth at gcc dot gnu.org Ever confirmed|0 |1 Last reconfirmed| |2020-04-20 Status|UNCONFIRMED |ASSIGNED Assignee|unassigned at gcc dot gnu.org |sandra at gcc dot gnu.org --- Comment #2 from Thomas Schwinge <tschwinge at gcc dot gnu.org> --- I'm not very familiar with IPA optimizations, I'm not yet clear on what can be done (optimized) with reasonable effort in GCC, and where/how, so will very much appreciate your input, Jakub, Richard, others. Sandra is going to look into this topic (who is not too familiar with OMP handling). As a first step, we shall not consider any offloading compilation specifics, but just the usual host-side compilation. Looking at the following simple example: int main(void) { int var; #pragma acc parallel copyout(var) #pragma omp target map(from:var) { var = 1; } // 'var' never read. return 0; } For reference, with '-fno-openacc -fno-openmp -O1' (so, OMP deactiveated, pragmas ignored), the 'var = 1' assignment disappears in the '030t.ccp1' dump file, and 'var' itself disappears in the '047t.release_ssa' dump file. With OMP enabled ('-fopenacc' shown in the following, but '-fopenmp' is very similar), it's more difficult to optimize 'var': the OMP region (here: just the 'var = 1' assignment, plus any set-up and tear-down code) is moved (outlined) into a separate function 'main._omp_fn.0', and the address of 'var' is taken, stored in an internal data structure '.omp_data_arr.1', and is dereferenced in 'main._omp_fn.0' to access the original 'var'. The outlined function 'main._omp_fn.0' is called via 'GOACC_parallel_keyed'. One step is to add logic so that in this example, we can optimize 'copyout' to 'create'. The 'copyout' clauses is encoded in '.omp_data_kinds.3[0]' as value '514', where with the upper data alignment byte stripped off, '514 & 255 = 2', which is 'GOMP_MAP_FROM'. This shall -- at a suitable point in the pass pipeline -- be optimized to 'GOMP_MAP_ALLOC'. Another step is to add logic so that the "dead"ness of 'var' after the outlined function 'main._omp_fn.0' called via 'GOACC_parallel_keyed' gets propagated into 'main._omp_fn.0', so that the 'var = 1' assignment can be eliminated. By the way, there already is some special IPA information handling for 'GOACC_parallel_keyed' ('BUILT_IN_GOACC_PARALLEL') in 'gcc/tree-ssa-structalias.c'. This is, if I remember correctly, to evaluate aliasing "transparently through" the outlined OMP function; see the PR46032 commit r231076 "Handle BUILT_IN_GOMP_PARALLEL in ipa-pta" and later commits, for reference. (And, there is a 'pass_oacc_ipa' to enable '-fipa-pta' for OpenACC OMP functions.) Working on eliminating 'var' completely from the internal '.omp_data_arr.1' etc. data structures shall be a separate step, for later. If that elimination happens when we're already in offloading compilation pipeline, this is expected to require more infrastructure to communicate that information back from the offloading compiler(s) to the host compiler/runtime. But first one step back -- I noticed that compiling the following: #pragma acc routine #pragma omp declare target static void __attribute__((noinline)) f(int *var) { *var = 1; } #pragma omp end declare target int main(void) { int var; #pragma acc parallel copyout(var) #pragma omp target map(from:var) { f(&var); } // 'var' never read. return 0; } ... with '-fno-openacc -fno-openmp -O1' (so, OMP deactiveated, pragmas ignored) also does *not* see any optimization of 'var'. Is that something that needs to be addressed first, before attempting the OMP case?