Re: [PATCH] vect: Add target hook to prefer gather/scatter instructions

Andrew Stubbs Fri, 25 Jul 2025 09:38:30 -0700

On 23/07/2025 16:23, Richard Biener wrote:

That said, the hook is a bit black/white - whether the target prefers
a gather/scatter over N piecewise operations with equal stride depends
at least on the vector mode.  On x86_64 for V2DImode definitely no
gather, for V16SFmode it probably depends (V64QImode gather isn't
supported).


I expected this one might come up.  What features would you like to see in a 
hook?  Should it be the mode, or the vectype? I see also masked_p that might be 
relevant to some architecture?


The mode, the (possibly constant) stride and the group size


How is this updated patch?

I couldn't figure out how to get the stride, so I've used the "scale",which matches what various other hooks and instructions get.

The hook is now called after the call tovect_use_strided_gather_scatters_p, which partly means that the hookdoesn't get called in cases where gather/scatter would never have beenallowed anyway, but mostly just because the gs_info isn't populateduntil after than call.


Andrew

From dac944b22e3e7543fce946757a3990b9353e7bd3 Mon Sep 17 00:00:00 2001
From: Julian Brown <jul...@codesourcery.com>
Date: Wed, 25 Nov 2020 09:08:01 -0800
Subject: [PATCH] vect: Add target hook to prefer gather/scatter instructions

For AMD GCN, the instructions available for loading/storing vectors are
always scatter/gather operations (i.e. there are separate addresses for
each vector lane), so the current heuristic to avoid gather/scatter
operations with too many elements in get_group_load_store_type is
counterproductive. Avoiding such operations in that function can
subsequently lead to a missed vectorization opportunity whereby later
analyses in the vectorizer try to use a very wide array type which is
not available on this target, and thus it bails out.

This patch adds a target hook to override the "single_element_p"
heuristic in the function as a target hook, and activates it for GCN. This
allows much better code to be generated for affected loops.

Co-authored-by:  Julian Brown  <jul...@codesourcery.com>

gcc/
	* doc/tm.texi.in (TARGET_VECTORIZE_PREFER_GATHER_SCATTER): Add
	documentation hook.
	* doc/tm.texi: Regenerate.
	* target.def (prefer_gather_scatter): Add target hook under vectorizer.
	* tree-vect-stmts.cc (get_group_load_store_type): Optionally prefer
	gather/scatter instructions to scalar/elementwise fallback.
	* config/gcn/gcn.cc (gcn_prefer_gather_scatter): New function.
	(TARGET_VECTORIZE_PREFER_GATHER_SCATTER): Define hook.
---
 gcc/config/gcn/gcn.cc  | 12 ++++++++++++
 gcc/doc/tm.texi        |  5 +++++
 gcc/doc/tm.texi.in     |  2 ++
 gcc/target.def         | 10 ++++++++++
 gcc/tree-vect-stmts.cc |  6 ++++--
 5 files changed, 33 insertions(+), 2 deletions(-)

diff --git a/gcc/config/gcn/gcn.cc b/gcc/config/gcn/gcn.cc
index 81871852148..6f2631c2138 100644
--- a/gcc/config/gcn/gcn.cc
+++ b/gcc/config/gcn/gcn.cc
@@ -5789,6 +5789,16 @@ gcn_libc_has_function (enum function_class fn_class,
   return bsd_libc_has_function (fn_class, type);
 }
 
+/* Implement TARGET_VECTORIZE_PREFER_GATHER_SCATTER. */
+
+static bool
+gcn_prefer_gather_scatter (machine_mode ARG_UNUSED (mode),
+			   int ARG_UNUSED (scale),
+			   unsigned int ARG_UNUSED (group_size))
+{
+  return true;
+}
+
 /* }}}  */
 /* {{{ md_reorg pass.  */
 
@@ -7985,6 +7995,8 @@ gcn_dwarf_register_span (rtx rtl)
   gcn_vectorize_builtin_vectorized_function
 #undef  TARGET_VECTORIZE_GET_MASK_MODE
 #define TARGET_VECTORIZE_GET_MASK_MODE gcn_vectorize_get_mask_mode
+#undef  TARGET_VECTORIZE_PREFER_GATHER_SCATTER
+#define TARGET_VECTORIZE_PREFER_GATHER_SCATTER gcn_prefer_gather_scatter
 #undef  TARGET_VECTORIZE_PREFERRED_SIMD_MODE
 #define TARGET_VECTORIZE_PREFERRED_SIMD_MODE gcn_vectorize_preferred_simd_mode
 #undef  TARGET_VECTORIZE_PREFERRED_VECTOR_ALIGNMENT
diff --git a/gcc/doc/tm.texi b/gcc/doc/tm.texi
index 5e305643b3a..3a98692b236 100644
--- a/gcc/doc/tm.texi
+++ b/gcc/doc/tm.texi
@@ -6511,6 +6511,11 @@ The default is @code{NULL_TREE} which means to not vectorize scatter
 stores.
 @end deftypefn
 
+@deftypefn {Target Hook} bool TARGET_VECTORIZE_PREFER_GATHER_SCATTER (machine_mode @var{mode}, int @var{scale}, unsigned int @var{group_size})
+This hook returns TRUE if gather loads or scatter stores are cheaper on
+this target than a sequence of elementwise loads or stores.  
+@end deftypefn
+
 @deftypefn {Target Hook} int TARGET_SIMD_CLONE_COMPUTE_VECSIZE_AND_SIMDLEN (struct cgraph_node *@var{}, struct cgraph_simd_clone *@var{}, @var{tree}, @var{int}, @var{bool})
 This hook should set @var{vecsize_mangle}, @var{vecsize_int}, @var{vecsize_float}
 fields in @var{simd_clone} structure pointed by @var{clone_info} argument and also
diff --git a/gcc/doc/tm.texi.in b/gcc/doc/tm.texi.in
index eccc4d88493..b03ad4c97c6 100644
--- a/gcc/doc/tm.texi.in
+++ b/gcc/doc/tm.texi.in
@@ -4311,6 +4311,8 @@ address;  but often a machine-dependent strategy can generate better code.
 
 @hook TARGET_VECTORIZE_BUILTIN_SCATTER
 
+@hook TARGET_VECTORIZE_PREFER_GATHER_SCATTER
+
 @hook TARGET_SIMD_CLONE_COMPUTE_VECSIZE_AND_SIMDLEN
 
 @hook TARGET_SIMD_CLONE_ADJUST
diff --git a/gcc/target.def b/gcc/target.def
index 38903eb567a..b654069d8f9 100644
--- a/gcc/target.def
+++ b/gcc/target.def
@@ -2056,6 +2056,16 @@ all zeros.  GCC can then try to branch around the instruction instead.",
  (unsigned ifn),
  default_empty_mask_is_expensive)
 
+/* Prefer gather/scatter loads/stores to e.g. elementwise accesses if\n\
+we cannot use a contiguous access.  */
+DEFHOOK
+(prefer_gather_scatter,
+ "This hook returns TRUE if gather loads or scatter stores are cheaper on\n\
+this target than a sequence of elementwise loads or stores.  ",
+ bool,
+ (machine_mode mode, int scale, unsigned int group_size),
+ hook_bool_void_false)
+
 /* Target builtin that implements vector gather operation.  */
 DEFHOOK
 (builtin_gather,
diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc
index 2e9b3d2e686..e6dff106728 100644
--- a/gcc/tree-vect-stmts.cc
+++ b/gcc/tree-vect-stmts.cc
@@ -2349,11 +2349,13 @@ get_group_load_store_type (vec_info *vinfo, stmt_vec_info stmt_info,
      allows us to use contiguous accesses.  */
   if ((*memory_access_type == VMAT_ELEMENTWISE
        || *memory_access_type == VMAT_STRIDED_SLP)
-      && single_element_p
       && SLP_TREE_LANES (slp_node) == 1
       && loop_vinfo
       && vect_use_strided_gather_scatters_p (stmt_info, loop_vinfo,
-					     masked_p, gs_info, elsvals))
+					     masked_p, gs_info, elsvals)
+      && (targetm.vectorize.prefer_gather_scatter (TYPE_MODE (vectype),
+						   gs_info->scale, group_size)
+	  || single_element_p))
     *memory_access_type = VMAT_GATHER_SCATTER;
 
   if (*memory_access_type == VMAT_CONTIGUOUS_DOWN
-- 
2.50.0

Re: [PATCH] vect: Add target hook to prefer gather/scatter instructions

Reply via email to