On Wed, 16 Dec 2015, Tom de Vries wrote: > On 10/12/15 14:14, Tom de Vries wrote: > > [ copy-pasting-with-quote from > > https://gcc.gnu.org/ml/gcc-patches/2015-12/msg00420.html , for some > > reason I didn't get this email ] > > > > > On Thu, 3 Dec 2015, Tom de Vries wrote: > > > > The flag is set here in expand_omp_target: > > > > ... > > > > 12682 /* Prevent IPA from removing child_fn as unreachable, > > > > since there are no > > > > 12683 refs from the parent function to child_fn in offload > > > > LTO mode. */ > > > > 12684 if (ENABLE_OFFLOADING) > > > > 12685 cgraph_node::get (child_fn)->mark_force_output (); > > > > ... > > > > > > > > > > How are there no refs from the "parent"? Are there not refs from > > > some kind of descriptor that maps fallback CPU and offloaded variants? > > > > That descriptor is the offload table, which is emitted in > > omp_finish_file. The function iterates over vectors offload_vars and > > offload_funcs. > > > > [ I would guess there's a one-on-one correspondance between > > symtab_node::offloadable and membership of either offload_vars or > > offload_funcs. ] > > > > > I think the above needs sorting out in somw way, making the refs > > > explicit rather than implicit via force_output. > > > > I've tried an approach where I add a test for node->offloadable next to > > each test for node->force_output, except for the test in the nonlocal_p > > def in ipa_pta_execute. But I didn't (yet) manage to make that work. > > > > > > I guess setting forced_by_abi instead would also mean child_fn is not > > > > removed > > > > as unreachable, while still allowing optimizations: > > > > ... > > > > /* Like FORCE_OUTPUT, but in the case it is ABI requiring the symbol > > > > to be exported. Unlike FORCE_OUTPUT this flag gets cleared to > > > > symbols promoted to static and it does not inhibit > > > > optimization. */ > > > > unsigned forced_by_abi : 1; > > > > ... > > > > > > > > But I suspect that other optimizations (than ipa-pta) might break > > > > things. > > > > > > How so? > > > > Probably it's more accurate to say that I do not understand the > > difference very well between force_output and force_by_abi, and what is > > the class of optimizations enabled by using forced_by_abi instead of > > force_output.' > > > > > > Essentially we have two situations: > > > > - in the host compiler, there is no need for the forced_output flag, > > > > and it inhibits optimization > > > > - in the accelerator compiler, it (or some equivalent) is needed > > > > Actually, things are slightly more complicated, I realize now. There's > > also the distinction between: > > - symbols declared as offloadable in the source code, and > > - symbols create by the compiler and marked offloadable > > > > > > I wonder if setting the force_output flag only when streaming the > > > > bytecode for > > > > offloading would work. That way, it wouldn't be set in the host > > > > compiler, > > > > while being set in the accelerator compiler. > > > > > > Yeah, that was my original thinking btw. > > > > FTR, I've tried that approach, as attached. It fixed the > > goacc/kernels-alias-ipa-pta*.c failures. And I ran target-libgomp (also > > using an accelerator configuration) without any regressions. > > How about this patch? > > We remove the setting of force_output when: > - encountering offloadable symbols in the frontend, or > - creating offloadable symbols in expand-omp. > > Instead, we set force_output in input_offload_tables. > > This is an improvement because: > - it moves the force_output setting to a single location > - it does the force_output setting ALAP
Works for me. Richard.