On 12/28/2017 05:14 PM, Jakub Jelinek wrote:
On Thu, Dec 28, 2017 at 05:06:57PM +0100, Jakub Jelinek wrote:
This has O(n^2) complexity for n == vec_safe_length (offload_funcs).
Can't you instead just have 2 IVs, one for where we read the vector elt and
one for where we write it if the 2 are different, then truncate the vector
if needed at the end?


Done.

Another thing, I think you can safely remove elts from the vector (== from
the host and offloading target arrays) only when !flag_lto, because we rely
on the two arrays being the same.

I now mark the offload_funcs with DECL_PRESERVE_P in expand_omp_target if flag_lto, so AFAIU the removal should not happen anymore for flag_lto.

 So you can't remove elts only on the host
and not on the device, or vice versa.  The output_offload_tables function
has:
   /* In WHOPR mode during the WPA stage the joint offload tables need to be
      streamed to one partition only.  That's why we free offload_funcs and
      offload_vars after the first call of output_offload_tables.  */
   if (flag_wpa)
     {
       vec_free (offload_funcs);
       vec_free (offload_vars);
     }
so at least with flag_wpa, if we remove anything in there, it won't be
reflected by the other tables.  So, can we do something different in case
we can't easily remove stuff from the vector anymore?  Either store some
placeholder in the tables (dunno if NULL would work or what),

I've tried NULL, that didn't work.

or instead
ensure corresponding functions can't be removed?


That's the approach I've chosen, as described above.

Maybe this removal if (!flag_lto) could be done earlier, e.g. at the
beginning of lto_output, and for nodes we keep around in the table
past that point set DECL_PRESERVE_P to 1 on the fndecl, so that we then
stream that flag.

Done.

Bootstrapped and reg-tested on x86_64.
Build and reg-tested for x86_64 with nvptx accelerator.

OK for trunk?

Thanks,
- Tom
Prune removed funcs from offload table

2017-12-27  Tom de Vries  <t...@codesourcery.com>

	PR libgomp/83046
	* omp-expand.c (expand_omp_target): If flag_lto, mark offload_funcs with
	DECL_PRESERVE_P.
	* lto-streamer-out.c (lto_output): Remove offload_funcs entries
	that no longer have a corresponding cgraph_node.  If !flag_lto, mark the
	remaining ones as DECL_PRESERVE_P.

	* testsuite/libgomp.oacc-c-c++-common/pr83046.c: New test.
	* testsuite/libgomp.c-c++-common/pr83046.c: New test.

---
 gcc/lto-streamer-out.c                             | 26 ++++++++++++++++++++++
 gcc/omp-expand.c                                   |  6 ++++-
 libgomp/testsuite/libgomp.c-c++-common/pr83046.c   | 25 +++++++++++++++++++++
 .../testsuite/libgomp.oacc-c-c++-common/pr83046.c  | 25 +++++++++++++++++++++
 4 files changed, 81 insertions(+), 1 deletion(-)

diff --git a/gcc/lto-streamer-out.c b/gcc/lto-streamer-out.c
index ba29bd0..c38e389 100644
--- a/gcc/lto-streamer-out.c
+++ b/gcc/lto-streamer-out.c
@@ -41,6 +41,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "builtins.h"
 #include "gomp-constants.h"
 #include "debug.h"
+#include "omp-offload.h"
 
 
 static void lto_write_tree (struct output_block*, tree, bool);
@@ -2355,6 +2356,31 @@ lto_output (void)
   int i, n_nodes;
   lto_symtab_encoder_t encoder = lto_get_out_decl_state ()->symtab_node_encoder;
 
+  bool truncated_p = false;
+  unsigned int write_index = 0;
+  for (unsigned read_index = 0; read_index < vec_safe_length (offload_funcs);
+       read_index++)
+    {
+      tree fn_decl = (*offload_funcs)[read_index];
+      bool remove_p = cgraph_node::get (fn_decl) == NULL;
+      if (remove_p)
+	{
+	  truncated_p = true;
+	  continue;
+	}
+
+      if (write_index != read_index)
+	(*offload_funcs)[write_index] = (*offload_funcs)[read_index];
+
+      write_index++;
+    }
+  if (truncated_p)
+    offload_funcs->truncate (write_index);
+
+  if (!flag_lto)
+    for (unsigned i = 0; i < vec_safe_length (offload_funcs); i++)
+      DECL_PRESERVE_P ((*offload_funcs)[i]) = 1;
+
   if (flag_checking)
     output = lto_bitmap_alloc ();
 
diff --git a/gcc/omp-expand.c b/gcc/omp-expand.c
index 0248833..59237ff 100644
--- a/gcc/omp-expand.c
+++ b/gcc/omp-expand.c
@@ -7058,7 +7058,11 @@ expand_omp_target (struct omp_region *region)
 
       /* Add the new function to the offload table.  */
       if (ENABLE_OFFLOADING)
-	vec_safe_push (offload_funcs, child_fn);
+	{
+	  if (flag_lto)
+	    DECL_PRESERVE_P (child_fn) = 1;
+	  vec_safe_push (offload_funcs, child_fn);
+	}
 
       bool need_asm = DECL_ASSEMBLER_NAME_SET_P (current_function_decl)
 		      && !DECL_ASSEMBLER_NAME_SET_P (child_fn);
diff --git a/libgomp/testsuite/libgomp.c-c++-common/pr83046.c b/libgomp/testsuite/libgomp.c-c++-common/pr83046.c
new file mode 100644
index 0000000..90dcb70
--- /dev/null
+++ b/libgomp/testsuite/libgomp.c-c++-common/pr83046.c
@@ -0,0 +1,25 @@
+/* { dg-do link } */
+
+#define N 100
+
+int
+main ()
+{
+  int a[N];
+  int i, x;
+  int c;
+
+  c = 1;
+#pragma omp target
+  for (i = 0; i < 100; i++)
+    a[i] = 0;
+
+  if (c)
+    __builtin_unreachable ();
+
+#pragma omp target
+  for (i = 0; i < 100; i++)
+    a[i] = 1;
+
+  return 0;
+}
diff --git a/libgomp/testsuite/libgomp.oacc-c-c++-common/pr83046.c b/libgomp/testsuite/libgomp.oacc-c-c++-common/pr83046.c
new file mode 100644
index 0000000..a2a085c
--- /dev/null
+++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/pr83046.c
@@ -0,0 +1,25 @@
+/* { dg-do link } */
+
+#define N 100
+
+int
+main ()
+{
+  int a[N];
+  int i, x;
+  int c;
+
+  c = 1;
+#pragma acc parallel loop
+  for (i = 0; i < 100; i++)
+    a[i] = 0;
+
+  if (c)
+    __builtin_unreachable ();
+
+#pragma acc parallel loop
+  for (i = 0; i < 100; i++)
+    a[i] = 1;
+
+  return 0;
+}

Reply via email to