Hi Tom!

On 2019-01-07T20:11:59+0100, Tom de Vries <tdevr...@suse.de> wrote:
> [nvptx] Force vl32 if calling vector-partitionable routines
>
> With PTX_MAX_VECTOR_LENGTH set to larger than PTX_WARP_SIZE, routines can be
> called from offloading regions with vector-size set to larger than warp size.
> OTOH, vector-partitionable routines assume warp-sized vector length.
>
> Detect if we're calling a vector-partitionable routine from an offloading
> region, and if so, fall back to warp-sized vector length in that region.
>
> 2018-12-17  Tom de Vries  <tdevr...@suse.de>
>
>       PR target/85486
>       * config/nvptx/nvptx.c (has_vector_partitionable_routine_calls_p): New
>       function.
>       (nvptx_goacc_validate_dims): Force vl32 if calling vector-partitionable
>       routines.

> --- a/gcc/config/nvptx/nvptx.c
> +++ b/gcc/config/nvptx/nvptx.c

> +/* Return true if FNDECL contains calls to vector-partitionable routines.  */
> +
> +static bool
> +has_vector_partitionable_routine_calls_p (tree fndecl)
> +{
> +  if (!fndecl)
> +    return false;
> +
> +  basic_block bb;
> +  FOR_EACH_BB_FN (bb, DECL_STRUCT_FUNCTION (fndecl))
> +    for (gimple_stmt_iterator i = gsi_start_bb (bb); !gsi_end_p (i);
> +      gsi_next_nondebug (&i))
> +      {
> +     gimple *stmt = gsi_stmt (i);
> +     if (gimple_code (stmt) != GIMPLE_CALL)
> +       continue;

(This might use '!is_gimple_call (stmt)'.)

> +
> +     tree callee = gimple_call_fndecl (stmt);
> +     if (!callee)
> +       continue;

Would there be any other case where this '!callee' conditional doesn't
really mean 'gimple_call_internal_p (stmt)'?  I thought about suggesting
to use that instead, and then maybe 'gcc_assert (callee)' (... which
doesn't trigger for any current testcases), but reviewing 'GIMPLE_CALL',
I now see further 'is_gimple_call_addr' legitimate cases.  What do these
mean, here?

And, should we add a comment why 'continue' is fine then, instead of
fail-safe 'return true'?

Couldn't an 'internal_fn' potentially also make use of OpenACC
parallelism?

> +
> +     tree attrs  = oacc_get_fn_attrib (callee);
> +     if (attrs == NULL_TREE)
> +       return false;

That's not correct, as far as I can tell: if the current callee doesn't
have an 'oacc function' attribute, we *stop* here any further processing,
and 'return false' indicating that there are no "calls to
vector-partitionable routines".  See bug fix and adjusted test case in
attached patch "Force vl32 if calling vector-partitionable routines: fix
case where callee doesn't have 'oacc function' attribute [PR85486]".  OK
to push?

> +
> +     int partition_level = oacc_fn_attrib_level (attrs);
> +     bool seq_routine_p = partition_level == GOMP_DIM_MAX;
> +     if (!seq_routine_p)
> +       return true;
> +      }
> +
> +  return false;
> +}

> @@ -5611,6 +5646,16 @@ nvptx_goacc_validate_dims_1 (tree decl, int dims[], 
> int fn_level)
>      old_dims[i] = dims[i];
>
>    const char *vector_reason = NULL;
> +  if (offload_region_p && has_vector_partitionable_routine_calls_p (decl))
> +    {
> +      if (dims[GOMP_DIM_VECTOR] > PTX_WARP_SIZE)
> +     {
> +       vector_reason = G_("using vector_length (%d) due to call to"
> +                          " vector-partitionable routine, ignoring %d");
> +       dims[GOMP_DIM_VECTOR] = PTX_WARP_SIZE;
> +     }
> +    }
> +
>    if (dims[GOMP_DIM_VECTOR] == 0)
>      {
>        vector_reason = G_("using vector_length (%d), ignoring runtime 
> setting");


Grüße
 Thomas


-----------------
Mentor Graphics (Deutschland) GmbH, Arnulfstraße 201, 80634 München / Germany
Registergericht München HRB 106955, Geschäftsführer: Thomas Heurung, Alexander 
Walter
>From 0399c9023b717ea686db912ca5c133a2d30752e4 Mon Sep 17 00:00:00 2001
From: Thomas Schwinge <tho...@codesourcery.com>
Date: Wed, 28 Oct 2020 12:04:46 +0100
Subject: [PATCH] Force vl32 if calling vector-partitionable routines: fix case
 where callee doesn't have 'oacc function' attribute [PR85486]

	gcc/
	PR target/85486
	* config/nvptx/nvptx.c (has_vector_partitionable_routine_calls_p):
	Fix case where callee doesn't have 'oacc function' attribute.
	libgomp/
	PR target/85486
	* testsuite/libgomp.oacc-c-c++-common/pr85486.c: Extend.
---
 gcc/config/nvptx/nvptx.c                              |  3 ++-
 libgomp/testsuite/libgomp.oacc-c-c++-common/pr85486.c | 10 ++++++++++
 2 files changed, 12 insertions(+), 1 deletion(-)

diff --git a/gcc/config/nvptx/nvptx.c b/gcc/config/nvptx/nvptx.c
index 17349475fff0..61a756fc6448 100644
--- a/gcc/config/nvptx/nvptx.c
+++ b/gcc/config/nvptx/nvptx.c
@@ -5674,7 +5674,8 @@ has_vector_partitionable_routine_calls_p (tree fndecl)
 
 	tree attrs  = oacc_get_fn_attrib (callee);
 	if (attrs == NULL_TREE)
-	  return false;
+	  /* Implicitly 'seq'.  */
+	  continue;
 
 	int partition_level = oacc_fn_attrib_level (attrs);
 	bool seq_routine_p = partition_level == GOMP_DIM_MAX;
diff --git a/libgomp/testsuite/libgomp.oacc-c-c++-common/pr85486.c b/libgomp/testsuite/libgomp.oacc-c-c++-common/pr85486.c
index 0d98b82f9932..38a61624d9f8 100644
--- a/libgomp/testsuite/libgomp.oacc-c-c++-common/pr85486.c
+++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/pr85486.c
@@ -7,6 +7,7 @@
 /* Minimized from ref-1.C.  */
 
 #include <stdio.h>
+#include <stdlib.h>
 
 #pragma acc routine vector
 void __attribute__((noinline, noclone))
@@ -33,6 +34,15 @@ main (void)
 
 #pragma acc parallel copy (ary) VECTOR_LENGTH /* { dg-warning "using vector_length \\(32\\) due to call to vector-partitionable routine, ignoring 128" } */
   {
+    /* Call a routine that is not tagged OpenACC 'routine' (but is still
+       available by default; thus something from libc), thus is implicitly
+       'seq'.  */
+    {
+      void *null = NULL;
+      asm ("" : : "g" (&null) : "memory");  /* Optimization barrier.  */
+      free (null);
+    }
+
     Vector (&ary[0][0], m * n, (1 << 24) - (1 << 16));
   }
 
-- 
2.17.1

Reply via email to