On Wed, 16 Dec 2015, Tom de Vries wrote:

> On 10/12/15 14:14, Tom de Vries wrote:
> > [ copy-pasting-with-quote from
> > https://gcc.gnu.org/ml/gcc-patches/2015-12/msg00420.html , for some
> > reason I didn't get this email ]
> > 
> > > On Thu, 3 Dec 2015, Tom de Vries wrote:
> > > > The flag is set here in expand_omp_target:
> > > > ...
> > > > 12682         /* Prevent IPA from removing child_fn as unreachable,
> > > >                  since there are no
> > > > 12683            refs from the parent function to child_fn in offload
> > > >                  LTO mode.  */
> > > > 12684         if (ENABLE_OFFLOADING)
> > > > 12685           cgraph_node::get (child_fn)->mark_force_output ();
> > > > ...
> > > > 
> > > 
> > > How are there no refs from the "parent"?  Are there not refs from
> > > some kind of descriptor that maps fallback CPU and offloaded variants?
> > 
> > That descriptor is the offload table, which is emitted in
> > omp_finish_file. The function iterates over vectors offload_vars and
> > offload_funcs.
> > 
> > [ I would guess there's a one-on-one correspondance between
> > symtab_node::offloadable and membership of either offload_vars or
> > offload_funcs. ]
> > 
> > > I think the above needs sorting out in somw way, making the refs
> > > explicit rather than implicit via force_output.
> > 
> > I've tried an approach where I add a test for node->offloadable next to
> > each test for node->force_output, except for the test in the nonlocal_p
> > def in ipa_pta_execute. But I didn't (yet) manage to make that work.
> > 
> > > > I guess setting forced_by_abi instead would also mean child_fn is not
> > > > removed
> > > > as unreachable, while still allowing optimizations:
> > > > ...
> > > >   /* Like FORCE_OUTPUT, but in the case it is ABI requiring the symbol
> > > >      to be exported.  Unlike FORCE_OUTPUT this flag gets cleared to
> > > >      symbols promoted to static and it does not inhibit
> > > >      optimization.  */
> > > >   unsigned forced_by_abi : 1;
> > > > ...
> > > > 
> > > > But I suspect that other optimizations (than ipa-pta) might break
> > > > things.
> > > 
> > > How so?
> > 
> > Probably it's more accurate to say that I do not understand the
> > difference very well between force_output and force_by_abi, and what is
> > the class of optimizations enabled by using forced_by_abi instead of
> > force_output.'
> > 
> > > > Essentially we have two situations:
> > > > - in the host compiler, there is no need for the forced_output flag,
> > > >   and it inhibits optimization
> > > > - in the accelerator compiler, it (or some equivalent) is needed
> > 
> > Actually, things are slightly more complicated, I realize now. There's
> > also the distinction between:
> > - symbols declared as offloadable in the source code, and
> > - symbols create by the compiler and marked offloadable
> > 
> > > > I wonder if setting the force_output flag only when streaming the
> > > > bytecode for
> > > > offloading would work. That way, it wouldn't be set in the host
> > > > compiler,
> > > > while being set in the accelerator compiler.
> > > 
> > > Yeah, that was my original thinking btw.
> > 
> > FTR, I've tried that approach, as attached. It fixed the
> > goacc/kernels-alias-ipa-pta*.c failures. And I ran target-libgomp (also
> > using an accelerator configuration) without any regressions.
> 
> How about this patch?
> 
> We remove the setting of force_output when:
> - encountering offloadable symbols in the frontend, or
> - creating offloadable symbols in expand-omp.
> 
> Instead, we set force_output in input_offload_tables.
> 
> This is an improvement because:
> - it moves the force_output setting to a single location
> - it does the force_output setting ALAP

Works for me.

Richard.

Reply via email to