On 27/11/15 12:42, Tom de Vries wrote:
On 23/11/15 12:41, Richard Biener wrote:
On Sat, 21 Nov 2015, Tom de Vries wrote:

>On 13/11/15 12:39, Jakub Jelinek wrote:
> >On Fri, Nov 13, 2015 at 12:29:51PM +0100, Richard Biener wrote:
> > > >thanks for the explanation. Filed as PR68331 - '[meta-bug]
fipa-pta
> > > >issues'.
> > > >
> > > >Any feedback on the '#pragma GCC
offload-alias=<none|pointer|all>' bit
> > > >above?
> > > >Is that sort of what you had in mind?
> > >
> > >Yes.  Whether that makes sense is another question of course.
You can
> > >annotate memory references with MR_DEPENDENCE_BASE/CLIQUE
yourself
> > >as well if you know dependences without the users intervention.
> >
> >I really don't like even the GCC offload-alias, I just don't see
anything
> >special on the offload code.  Not to mention that the same issue
is already
> >with other outlined functions, like OpenMP tasks or parallel
regions, those
> >aren't offloaded, yet they can suffer from worse alias/points-to
analysis
> >too.
>
>AFAIU there is one aspect that is different for offloaded code: the
setup of
>the data on the device.
>
>Consider this example:
>...
>unsigned int a[N];
>unsigned int b[N];
>unsigned int c[N];
>
>int
>main (void)
>{
>   ...
>
>#pragma acc kernels copyin (a) copyin (b) copyout (c)
>   {
>     for (COUNTERTYPE ii = 0; ii < N; ii++)
>       c[ii] = a[ii] + b[ii];
>   }
>
>   ...
>...
>
>At gimple level, we have:
>...
>#pragma omp target oacc_kernels \
>   map(force_from:c [len: 2097152]) \
>   map(force_to:b [len: 2097152]) \
>   map(force_to:a [len: 2097152])
>...
>
>[ The meaning of the force_from/force_to mappings is given in
>include/gomp-constants.h:
>...
>     /* Allocate.  */
>     GOMP_MAP_FORCE_ALLOC = (GOMP_MAP_FLAG_FORCE | GOMP_MAP_ALLOC),
>     /* ..., and copy to device.  */
>     GOMP_MAP_FORCE_TO = (GOMP_MAP_FLAG_FORCE | GOMP_MAP_TO),
>     /* ..., and copy from device.  */
>     GOMP_MAP_FORCE_FROM = (GOMP_MAP_FLAG_FORCE | GOMP_MAP_FROM),
>     /* ..., and copy to and from device.  */
>     GOMP_MAP_FORCE_TOFROM = (GOMP_MAP_FLAG_FORCE | GOMP_MAP_TOFROM),
>...  ]
>
>So before calling the offloaded function, a separate alloc is done
for a, b
>and c, and the base pointers of the newly allocated objects are
passed to the
>offloaded function.
>
>This means we can mark those base pointers as restrict in the offloaded
>function.
>
>Attached proof-of-concept patch implements that.
>
> >We simply have some compiler internal interface between the
caller and
> >callee of the outlined regions, each interface in between those has
> >its own structure type used to communicate the info;
> >we can attach attributes on the fields, or some flags to indicate
some
> >properties interesting from aliasing POV.
> >We don't really need to perform
> >full IPA-PTA, perhaps it would be enough to a) record somewhere
in cgraph
> >the relationship in between such callers and callees (for
offloading regions
> >we already have "omp target entrypoint" attribute on the callee
and a
> >singler caller), tell LTO if possible not to split those into
different
> >partitions if easily possible, and then just for these pairs perform
> >aliasing/points-to analysis in the caller and the result record
using
> >cliques/special attributes/whatever to the callee side, so that
the callee
> >(outlined OpenMP/OpenACC/Cilk+ region) can then improve its alias
analysis.
>
>As a start, is the approach of this patch OK?
Works for me but leaving to Jakub to review for correctness.

Attached patch is a complete version:
- added ChangeLog
- added missing function header comments
- moved analysis to separate function
   omp_target_base_pointers_restrict_p
- added example in comment before analysis
- fixed error in omp_target_base_pointers_restrict_p where I was using
   GOMP_MAP_ALLOC but should have been using GOMP_MAP_FORCE_ALLOC
- added testcases


This follow-up patch handles the case that we copy from/to pointers rather than declared variables:
...
       void foo (unsigned int *a, unsigned int *b)
       {
         #pragma acc kernels copyout (a[0:2]) copyout (b[0:2])
         {
           a[0] = 0;
           b[0] = 1;
         }
       }
...

After gimplification, we have:
...
     foo (unsigned int * a, unsigned int * b)
     {
       unsigned int * b.0;
       unsigned int * a.1;

       b.0 = b;
       a.1 = a;
       #pragma omp target oacc_kernels \
         map(force_from:*a.1 (*a) [len: 8]) \
         map(alloc:a [pointer assign, bias: 0]) \
         map(force_from:*b.0 (*b) [len: 8]) \
         map(alloc:b [pointer assign, bias: 0])
       {
         unsigned int * a.2;
         unsigned int * b.3;

         a.2 = a;
         *a.2 = 0;
         b.3 = b;
         *b.3 = 1;
      }
     }
...

We don't bail out of omp_target_base_pointers_restrict_p when encountering 'map(alloc:a [pointer assign, bias: 0])', given that we can find the matching 'map(force_from:*a.1 (*a) [len: 8])'.

Using this and the previous patch, I'm able to do auto-parallelization on all the oacc kernels c test-cases, with the obvious exception of the testcases where some of used variables are mapped using the 'present' tag (in other words, missing the force tag).

Bootstrapped and reg-tested on x86_64.

OK for stage3 trunk?

Thanks,
- Tom

Handle non-declared variables in kernels alias analysis

2015-11-27  Tom de Vries  <t...@codesourcery.com>

	* gimplify.c (gimplify_scan_omp_clauses): Initialize
	OMP_CLAUSE_ORIG_DECL.
	* omp-low.c (install_var_field_1): Handle base_pointers_restrict for
	pointers.
	(map_ptr_clause_points_to_clause_p)
	(nr_map_ptr_clauses_pointing_to_clause): New function.
	(omp_target_base_pointers_restrict_p): Handle GOMP_MAP_POINTER.
	* tree-pretty-print.c (dump_omp_clause): Print OMP_CLAUSE_ORIG_DECL.
	* tree.c (omp_clause_num_ops): Set num_ops for OMP_CLAUSE_MAP to 3.
	* tree.h (OMP_CLAUSE_ORIG_DECL): New macro.

	* c-c++-common/goacc/kernels-alias-10.c: New test.
	* c-c++-common/goacc/kernels-alias-9.c: New test.

---
 gcc/gimplify.c                                     |   1 +
 gcc/omp-low.c                                      | 134 ++++++++++++++++++++-
 .../c-c++-common/goacc/kernels-alias-10.c          |  29 +++++
 gcc/testsuite/c-c++-common/goacc/kernels-alias-9.c |  29 +++++
 gcc/tree-pretty-print.c                            |   8 ++
 gcc/tree.c                                         |   2 +-
 gcc/tree.h                                         |   5 +
 7 files changed, 205 insertions(+), 3 deletions(-)

diff --git a/gcc/gimplify.c b/gcc/gimplify.c
index a3ed378..fcac745 100644
--- a/gcc/gimplify.c
+++ b/gcc/gimplify.c
@@ -6713,6 +6713,7 @@ gimplify_scan_omp_clauses (tree *list_p, gimple_seq *pre_p,
 	  if (!DECL_P (decl))
 	    {
 	      tree d = decl, *pd;
+	      OMP_CLAUSE_ORIG_DECL (c) = copy_node (decl);
 	      if (TREE_CODE (d) == ARRAY_REF)
 		{
 		  while (TREE_CODE (d) == ARRAY_REF)
diff --git a/gcc/omp-low.c b/gcc/omp-low.c
index 6843c49..8ae08c52 100644
--- a/gcc/omp-low.c
+++ b/gcc/omp-low.c
@@ -1396,6 +1396,9 @@ install_var_field_1 (tree var, bool by_ref, int mask, omp_context *ctx,
     }
   else if (by_ref)
     {
+      if (base_pointers_restrict
+	  && POINTER_TYPE_P (type))
+	type = build_qualified_type (type, TYPE_QUAL_RESTRICT);
       type = build_pointer_type (type);
       if (base_pointers_restrict)
 	type = build_qualified_type (type, TYPE_QUAL_RESTRICT);
@@ -3057,6 +3060,64 @@ scan_omp_single (gomp_single *stmt, omp_context *outer_ctx)
     layout_type (ctx->record_type);
 }
 
+/* Return true if OMP_CLAUSE_DECL (MAP_POINTER_CLAUSE) points to
+   OMP_CLAUSE_DECL (CLAUSE).  */
+
+static bool
+map_ptr_clause_points_to_clause_p (tree map_pointer_clause, tree clause)
+{
+  gcc_assert (OMP_CLAUSE_CODE (map_pointer_clause) == OMP_CLAUSE_MAP);
+  gcc_assert (OMP_CLAUSE_MAP_KIND (map_pointer_clause) == GOMP_MAP_POINTER);
+
+  if (OMP_CLAUSE_CODE (clause) != OMP_CLAUSE_MAP)
+    return false;
+
+  tree orig_decl = OMP_CLAUSE_ORIG_DECL (clause);
+  if (orig_decl == NULL_TREE)
+    return false;
+
+  tree ptr_decl = OMP_CLAUSE_DECL (map_pointer_clause);
+  switch (TREE_CODE (orig_decl))
+    {
+    case ARRAY_REF:
+      if (!integer_zerop (TREE_OPERAND (orig_decl, 1)))
+	return false;
+
+      /* Fall through.  */
+    case INDIRECT_REF:
+      if (!operand_equal_p (ptr_decl, TREE_OPERAND (orig_decl, 0), 0))
+	return false;
+      break;
+    default:
+      return false;
+    }
+
+  return true;
+}
+
+/* Return the number of map_pointer clauses in CLAUSES pointing to CLAUSE.  */
+
+static unsigned int
+nr_map_ptr_clauses_pointing_to_clause (tree clauses, tree clause)
+{
+  unsigned int nr = 0;
+
+  tree c;
+  for (c = clauses; c; c = OMP_CLAUSE_CHAIN (c))
+    {
+      if (OMP_CLAUSE_CODE (c) != OMP_CLAUSE_MAP)
+	continue;
+
+      if (OMP_CLAUSE_MAP_KIND (c) != GOMP_MAP_POINTER)
+	continue;
+
+      if (map_ptr_clause_points_to_clause_p (c, clause))
+	nr++;
+    }
+
+  return nr;
+}
+
 /* Return true if the CLAUSES of an omp target guarantee that the base pointers
    used in the corresponding offloaded function are restrict.  */
 
@@ -3096,8 +3157,59 @@ omp_target_base_pointers_restrict_p (tree clauses)
      Because both mappings have the force prefix, we know that they will be
      allocated when calling the corresponding offloaded function, which means we
      can mark the base pointers for a and b in the offloaded function as
-     restrict.  */
+     restrict.
+
+     II.  GOMP_MAP_POINTER example:
 
+       void foo (unsigned int *a, unsigned int *b)
+       {
+	 #pragma acc kernels copyout (a[0:2]) copyout (b[0:2])
+	 {
+	   a[0] = 0;
+	   b[0] = 1;
+	 }
+       }
+
+     After gimplification, we have:
+
+     foo (unsigned int * a, unsigned int * b)
+     {
+       unsigned int * b.0;
+       unsigned int * a.1;
+
+       b.0 = b;
+       a.1 = a;
+       #pragma omp target oacc_kernels \
+	 map(force_from:*a.1 (*a) [len: 8]) \
+	 map(alloc:a [pointer assign, bias: 0]) \
+	 map(force_from:*b.0 (*b) [len: 8]) \
+	 map(alloc:b [pointer assign, bias: 0])
+       {
+	 unsigned int * a.2;
+	 unsigned int * b.3;
+
+	 a.2 = a;
+	 *a.2 = 0;
+	 b.3 = b;
+	 *b.3 = 1;
+       }
+     }
+
+     Because:
+     - we can prove for both pointer assign mappings that they point to a
+       force-prefixed mapping, and
+     - the force-prefixed mappings themselves do not have their OMP_CLAUSE_DECL
+       used in the body,
+     we can mark the base pointers for a and b in the offloaded function as
+     restrict.
+
+     KLUDGE: In order to connect the pointer mapping clause to the force_*
+     clause, we need to save the pre-gimplification OMP_CLAUSE_DECL as
+     OMP_CLAUSE_ORIG_DECL.  Note that OMP_CLAUSE_ORIG_DECL is printed as '(*a)'
+     in 'map(force_from:*a.1 (*a) [len: 8])'.  */
+
+  unsigned int ptr_found = 0;
+  unsigned int ptr_matched = 0;
   tree c;
   for (c = clauses; c; c = OMP_CLAUSE_CHAIN (c))
     {
@@ -3110,13 +3222,31 @@ omp_target_base_pointers_restrict_p (tree clauses)
 	case GOMP_MAP_FORCE_TO:
 	case GOMP_MAP_FORCE_FROM:
 	case GOMP_MAP_FORCE_TOFROM:
+	  {
+	    unsigned int nr
+	      = nr_map_ptr_clauses_pointing_to_clause (clauses, c);
+	    if (DECL_P (OMP_CLAUSE_DECL (c)))
+	      {
+		if (nr != 0)
+		  return false;
+	      }
+	    else
+	      {
+		if (nr != 1)
+		  return false;
+		ptr_matched++;
+	      }
+	  }
+	  break;
+	case GOMP_MAP_POINTER:
+	  ptr_found++;
 	  break;
 	default:
 	  return false;
 	}
     }
 
-  return true;
+  return ptr_found == ptr_matched;
 }
 
 /* Scan a GIMPLE_OMP_TARGET.  */
diff --git a/gcc/testsuite/c-c++-common/goacc/kernels-alias-10.c b/gcc/testsuite/c-c++-common/goacc/kernels-alias-10.c
new file mode 100644
index 0000000..ce5bbe8
--- /dev/null
+++ b/gcc/testsuite/c-c++-common/goacc/kernels-alias-10.c
@@ -0,0 +1,29 @@
+/* { dg-additional-options "-O2" } */
+/* { dg-additional-options "-fdump-tree-ealias-all" } */
+
+#define N 2
+
+void
+foo (void)
+{
+  unsigned int a[N];
+  unsigned int b[N];
+  unsigned int c[N];
+  unsigned int d[N];
+
+#pragma acc kernels copyin (a[0:N]) create (b[0:N]) copyout (c[0:N]) copy (d[0:N])
+  {
+    a[0] = 0;
+    b[0] = 0;
+    c[0] = 0;
+    d[0] = 0;
+  }
+}
+
+/* { dg-final { scan-tree-dump-times "clique 1 base 1" 4 "ealias" } } */
+/* { dg-final { scan-tree-dump-times "clique 1 base 2" 1 "ealias" } } */
+/* { dg-final { scan-tree-dump-times "clique 1 base 3" 1 "ealias" } } */
+/* { dg-final { scan-tree-dump-times "clique 1 base 4" 1 "ealias" } } */
+/* { dg-final { scan-tree-dump-times "clique 1 base 5" 1 "ealias" } } */
+/* { dg-final { scan-tree-dump-times "(?n)clique .* base .*" 8 "ealias" } } */
+
diff --git a/gcc/testsuite/c-c++-common/goacc/kernels-alias-9.c b/gcc/testsuite/c-c++-common/goacc/kernels-alias-9.c
new file mode 100644
index 0000000..7229fd4
--- /dev/null
+++ b/gcc/testsuite/c-c++-common/goacc/kernels-alias-9.c
@@ -0,0 +1,29 @@
+/* { dg-additional-options "-O2" } */
+/* { dg-additional-options "-fdump-tree-ealias-all" } */
+
+#define N 2
+
+void
+foo (unsigned int *a, unsigned int *b, unsigned int *c, unsigned int *d)
+{
+
+#pragma acc kernels copyin (a[0:N]) create (b[0:N]) copyout (c[0:N]) copy (d[0:N])
+  {
+    a[0] = 0;
+    b[0] = 0;
+    c[0] = 0;
+    d[0] = 0;
+  }
+}
+
+/* { dg-final { scan-tree-dump-times "clique 1 base 1" 4 "ealias" } } */
+/* { dg-final { scan-tree-dump-times "clique 1 base 2" 1 "ealias" } } */
+/* { dg-final { scan-tree-dump-times "clique 1 base 3" 1 "ealias" } } */
+/* { dg-final { scan-tree-dump-times "clique 1 base 4" 1 "ealias" } } */
+/* { dg-final { scan-tree-dump-times "clique 1 base 5" 1 "ealias" } } */
+/* { dg-final { scan-tree-dump-times "clique 1 base 6" 1 "ealias" } } */
+/* { dg-final { scan-tree-dump-times "clique 1 base 7" 1 "ealias" } } */
+/* { dg-final { scan-tree-dump-times "clique 1 base 8" 1 "ealias" } } */
+/* { dg-final { scan-tree-dump-times "clique 1 base 9" 1 "ealias" } } */
+/* { dg-final { scan-tree-dump-times "(?n)clique .* base .*" 12 "ealias" } } */
+
diff --git a/gcc/tree-pretty-print.c b/gcc/tree-pretty-print.c
index caec760..4b94f18 100644
--- a/gcc/tree-pretty-print.c
+++ b/gcc/tree-pretty-print.c
@@ -666,6 +666,14 @@ dump_omp_clause (pretty_printer *pp, tree clause, int spc, int flags)
       pp_colon (pp);
       dump_generic_node (pp, OMP_CLAUSE_DECL (clause),
 			 spc, flags, false);
+      if (OMP_CLAUSE_ORIG_DECL (clause) != NULL_TREE)
+	{
+	  pp_space (pp);
+	  pp_left_paren (pp);
+	  dump_generic_node (pp, OMP_CLAUSE_ORIG_DECL (clause),
+			     spc, flags, false);
+	  pp_right_paren (pp);
+	}
      print_clause_size:
       if (OMP_CLAUSE_SIZE (clause))
 	{
diff --git a/gcc/tree.c b/gcc/tree.c
index 779fe93..45f9a17 100644
--- a/gcc/tree.c
+++ b/gcc/tree.c
@@ -277,7 +277,7 @@ unsigned const char omp_clause_num_ops[] =
   1, /* OMP_CLAUSE_LINK  */
   2, /* OMP_CLAUSE_FROM  */
   2, /* OMP_CLAUSE_TO  */
-  2, /* OMP_CLAUSE_MAP  */
+  3, /* OMP_CLAUSE_MAP  */
   1, /* OMP_CLAUSE_USE_DEVICE_PTR  */
   1, /* OMP_CLAUSE_IS_DEVICE_PTR  */
   2, /* OMP_CLAUSE__CACHE_  */
diff --git a/gcc/tree.h b/gcc/tree.h
index cb52deb..27221ee 100644
--- a/gcc/tree.h
+++ b/gcc/tree.h
@@ -1382,6 +1382,11 @@ extern void protected_set_expr_location (tree, location_t);
   OMP_CLAUSE_OPERAND (OMP_CLAUSE_RANGE_CHECK (OMP_CLAUSE_CHECK (NODE),	\
 					      OMP_CLAUSE_PRIVATE,	\
 					      OMP_CLAUSE__LOOPTEMP_), 0)
+#define OMP_CLAUSE_ORIG_DECL(NODE)					\
+  OMP_CLAUSE_OPERAND (OMP_CLAUSE_RANGE_CHECK (OMP_CLAUSE_CHECK (NODE),	\
+					      OMP_CLAUSE_PRIVATE,	\
+					      OMP_CLAUSE__LOOPTEMP_), 2)
+
 #define OMP_CLAUSE_HAS_LOCATION(NODE) \
   (LOCATION_LOCUS ((OMP_CLAUSE_CHECK (NODE))->omp_clause.locus)		\
   != UNKNOWN_LOCATION)

Reply via email to