Hi Tom! On 2019-01-07T20:11:59+0100, Tom de Vries <tdevr...@suse.de> wrote: > [nvptx] Force vl32 if calling vector-partitionable routines > > With PTX_MAX_VECTOR_LENGTH set to larger than PTX_WARP_SIZE, routines can be > called from offloading regions with vector-size set to larger than warp size. > OTOH, vector-partitionable routines assume warp-sized vector length. > > Detect if we're calling a vector-partitionable routine from an offloading > region, and if so, fall back to warp-sized vector length in that region. > > 2018-12-17 Tom de Vries <tdevr...@suse.de> > > PR target/85486 > * config/nvptx/nvptx.c (has_vector_partitionable_routine_calls_p): New > function. > (nvptx_goacc_validate_dims): Force vl32 if calling vector-partitionable > routines.
> --- a/gcc/config/nvptx/nvptx.c > +++ b/gcc/config/nvptx/nvptx.c > +/* Return true if FNDECL contains calls to vector-partitionable routines. */ > + > +static bool > +has_vector_partitionable_routine_calls_p (tree fndecl) > +{ > + if (!fndecl) > + return false; > + > + basic_block bb; > + FOR_EACH_BB_FN (bb, DECL_STRUCT_FUNCTION (fndecl)) > + for (gimple_stmt_iterator i = gsi_start_bb (bb); !gsi_end_p (i); > + gsi_next_nondebug (&i)) > + { > + gimple *stmt = gsi_stmt (i); > + if (gimple_code (stmt) != GIMPLE_CALL) > + continue; (This might use '!is_gimple_call (stmt)'.) > + > + tree callee = gimple_call_fndecl (stmt); > + if (!callee) > + continue; Would there be any other case where this '!callee' conditional doesn't really mean 'gimple_call_internal_p (stmt)'? I thought about suggesting to use that instead, and then maybe 'gcc_assert (callee)' (... which doesn't trigger for any current testcases), but reviewing 'GIMPLE_CALL', I now see further 'is_gimple_call_addr' legitimate cases. What do these mean, here? And, should we add a comment why 'continue' is fine then, instead of fail-safe 'return true'? Couldn't an 'internal_fn' potentially also make use of OpenACC parallelism? > + > + tree attrs = oacc_get_fn_attrib (callee); > + if (attrs == NULL_TREE) > + return false; That's not correct, as far as I can tell: if the current callee doesn't have an 'oacc function' attribute, we *stop* here any further processing, and 'return false' indicating that there are no "calls to vector-partitionable routines". See bug fix and adjusted test case in attached patch "Force vl32 if calling vector-partitionable routines: fix case where callee doesn't have 'oacc function' attribute [PR85486]". OK to push? > + > + int partition_level = oacc_fn_attrib_level (attrs); > + bool seq_routine_p = partition_level == GOMP_DIM_MAX; > + if (!seq_routine_p) > + return true; > + } > + > + return false; > +} > @@ -5611,6 +5646,16 @@ nvptx_goacc_validate_dims_1 (tree decl, int dims[], > int fn_level) > old_dims[i] = dims[i]; > > const char *vector_reason = NULL; > + if (offload_region_p && has_vector_partitionable_routine_calls_p (decl)) > + { > + if (dims[GOMP_DIM_VECTOR] > PTX_WARP_SIZE) > + { > + vector_reason = G_("using vector_length (%d) due to call to" > + " vector-partitionable routine, ignoring %d"); > + dims[GOMP_DIM_VECTOR] = PTX_WARP_SIZE; > + } > + } > + > if (dims[GOMP_DIM_VECTOR] == 0) > { > vector_reason = G_("using vector_length (%d), ignoring runtime > setting"); Grüße Thomas ----------------- Mentor Graphics (Deutschland) GmbH, Arnulfstraße 201, 80634 München / Germany Registergericht München HRB 106955, Geschäftsführer: Thomas Heurung, Alexander Walter
>From 0399c9023b717ea686db912ca5c133a2d30752e4 Mon Sep 17 00:00:00 2001 From: Thomas Schwinge <tho...@codesourcery.com> Date: Wed, 28 Oct 2020 12:04:46 +0100 Subject: [PATCH] Force vl32 if calling vector-partitionable routines: fix case where callee doesn't have 'oacc function' attribute [PR85486] gcc/ PR target/85486 * config/nvptx/nvptx.c (has_vector_partitionable_routine_calls_p): Fix case where callee doesn't have 'oacc function' attribute. libgomp/ PR target/85486 * testsuite/libgomp.oacc-c-c++-common/pr85486.c: Extend. --- gcc/config/nvptx/nvptx.c | 3 ++- libgomp/testsuite/libgomp.oacc-c-c++-common/pr85486.c | 10 ++++++++++ 2 files changed, 12 insertions(+), 1 deletion(-) diff --git a/gcc/config/nvptx/nvptx.c b/gcc/config/nvptx/nvptx.c index 17349475fff0..61a756fc6448 100644 --- a/gcc/config/nvptx/nvptx.c +++ b/gcc/config/nvptx/nvptx.c @@ -5674,7 +5674,8 @@ has_vector_partitionable_routine_calls_p (tree fndecl) tree attrs = oacc_get_fn_attrib (callee); if (attrs == NULL_TREE) - return false; + /* Implicitly 'seq'. */ + continue; int partition_level = oacc_fn_attrib_level (attrs); bool seq_routine_p = partition_level == GOMP_DIM_MAX; diff --git a/libgomp/testsuite/libgomp.oacc-c-c++-common/pr85486.c b/libgomp/testsuite/libgomp.oacc-c-c++-common/pr85486.c index 0d98b82f9932..38a61624d9f8 100644 --- a/libgomp/testsuite/libgomp.oacc-c-c++-common/pr85486.c +++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/pr85486.c @@ -7,6 +7,7 @@ /* Minimized from ref-1.C. */ #include <stdio.h> +#include <stdlib.h> #pragma acc routine vector void __attribute__((noinline, noclone)) @@ -33,6 +34,15 @@ main (void) #pragma acc parallel copy (ary) VECTOR_LENGTH /* { dg-warning "using vector_length \\(32\\) due to call to vector-partitionable routine, ignoring 128" } */ { + /* Call a routine that is not tagged OpenACC 'routine' (but is still + available by default; thus something from libc), thus is implicitly + 'seq'. */ + { + void *null = NULL; + asm ("" : : "g" (&null) : "memory"); /* Optimization barrier. */ + free (null); + } + Vector (&ary[0][0], m * n, (1 << 24) - (1 << 16)); } -- 2.17.1