[ copy-pasting-with-quote from https://gcc.gnu.org/ml/gcc-patches/2015-12/msg00420.html , for some reason I didn't get this email ]

On Thu, 3 Dec 2015, Tom de Vries wrote:
The flag is set here in expand_omp_target:
...
12682         /* Prevent IPA from removing child_fn as unreachable,
                 since there are no
12683            refs from the parent function to child_fn in offload
                 LTO mode.  */
12684         if (ENABLE_OFFLOADING)
12685           cgraph_node::get (child_fn)->mark_force_output ();
...


How are there no refs from the "parent"?  Are there not refs from
some kind of descriptor that maps fallback CPU and offloaded variants?

That descriptor is the offload table, which is emitted in omp_finish_file. The function iterates over vectors offload_vars and offload_funcs.

[ I would guess there's a one-on-one correspondance between symtab_node::offloadable and membership of either offload_vars or offload_funcs. ]

I think the above needs sorting out in somw way, making the refs
explicit rather than implicit via force_output.

I've tried an approach where I add a test for node->offloadable next to each test for node->force_output, except for the test in the nonlocal_p def in ipa_pta_execute. But I didn't (yet) manage to make that work.

I guess setting forced_by_abi instead would also mean child_fn is not removed
as unreachable, while still allowing optimizations:
...
  /* Like FORCE_OUTPUT, but in the case it is ABI requiring the symbol
     to be exported.  Unlike FORCE_OUTPUT this flag gets cleared to
     symbols promoted to static and it does not inhibit
     optimization.  */
  unsigned forced_by_abi : 1;
...

But I suspect that other optimizations (than ipa-pta) might break things.

How so?

Probably it's more accurate to say that I do not understand the difference very well between force_output and force_by_abi, and what is the class of optimizations enabled by using forced_by_abi instead of force_output.'

Essentially we have two situations:
- in the host compiler, there is no need for the forced_output flag,
  and it inhibits optimization
- in the accelerator compiler, it (or some equivalent) is needed

Actually, things are slightly more complicated, I realize now. There's also the distinction between:
- symbols declared as offloadable in the source code, and
- symbols create by the compiler and marked offloadable

I wonder if setting the force_output flag only when streaming the bytecode for
offloading would work. That way, it wouldn't be set in the host compiler,
while being set in the accelerator compiler.

Yeah, that was my original thinking btw.

FTR, I've tried that approach, as attached. It fixed the goacc/kernels-alias-ipa-pta*.c failures. And I ran target-libgomp (also using an accelerator configuration) without any regressions.

Thanks,
- Tom

Set force_output in offload stream if offloadable

---
 gcc/lto-cgraph.c | 2 +-
 gcc/omp-low.c    | 2 ++
 2 files changed, 3 insertions(+), 1 deletion(-)

diff --git a/gcc/lto-cgraph.c b/gcc/lto-cgraph.c
index 62e5454..c862b19 100644
--- a/gcc/lto-cgraph.c
+++ b/gcc/lto-cgraph.c
@@ -511,7 +511,7 @@ lto_output_node (struct lto_simple_output_block *ob, struct cgraph_node *node,
   bp_pack_value (&bp, node->local.versionable, 1);
   bp_pack_value (&bp, node->local.can_change_signature, 1);
   bp_pack_value (&bp, node->local.redefined_extern_inline, 1);
-  bp_pack_value (&bp, node->force_output, 1);
+  bp_pack_value (&bp, node->force_output || node->offloadable, 1);
   bp_pack_value (&bp, node->forced_by_abi, 1);
   bp_pack_value (&bp, node->unique_name, 1);
   bp_pack_value (&bp, node->body_removed, 1);
diff --git a/gcc/omp-low.c b/gcc/omp-low.c
index 5643480..569cfd7 100644
--- a/gcc/omp-low.c
+++ b/gcc/omp-low.c
@@ -12770,10 +12770,12 @@ expand_omp_target (struct omp_region *region)
 	assign_assembler_name_if_neeeded (child_fn);
       cgraph_edge::rebuild_edges ();
 
+#if 0
       /* Prevent IPA from removing child_fn as unreachable, since there are no
 	 refs from the parent function to child_fn in offload LTO mode.  */
       if (ENABLE_OFFLOADING)
 	cgraph_node::get (child_fn)->mark_force_output ();
+#endif
 
       /* Some EH regions might become dead, see PR34608.  If
 	 pass_cleanup_cfg isn't the first pass to happen with the

Reply via email to