On 23/07/2025 16:23, Richard Biener wrote:
That said, the hook is a bit black/white - whether the target prefers
a gather/scatter over N piecewise operations with equal stride depends
at least on the vector mode. On x86_64 for V2DImode definitely no
gather, for V16SFmode it probably depends (V64QImode gather isn't
supported).
I expected this one might come up. What features would you like to see in a
hook? Should it be the mode, or the vectype? I see also masked_p that might be
relevant to some architecture?
The mode, the (possibly constant) stride and the group size
How is this updated patch?
I couldn't figure out how to get the stride, so I've used the "scale",
which matches what various other hooks and instructions get.
The hook is now called after the call to
vect_use_strided_gather_scatters_p, which partly means that the hook
doesn't get called in cases where gather/scatter would never have been
allowed anyway, but mostly just because the gs_info isn't populated
until after than call.
Andrew
From dac944b22e3e7543fce946757a3990b9353e7bd3 Mon Sep 17 00:00:00 2001
From: Julian Brown <jul...@codesourcery.com>
Date: Wed, 25 Nov 2020 09:08:01 -0800
Subject: [PATCH] vect: Add target hook to prefer gather/scatter instructions
For AMD GCN, the instructions available for loading/storing vectors are
always scatter/gather operations (i.e. there are separate addresses for
each vector lane), so the current heuristic to avoid gather/scatter
operations with too many elements in get_group_load_store_type is
counterproductive. Avoiding such operations in that function can
subsequently lead to a missed vectorization opportunity whereby later
analyses in the vectorizer try to use a very wide array type which is
not available on this target, and thus it bails out.
This patch adds a target hook to override the "single_element_p"
heuristic in the function as a target hook, and activates it for GCN. This
allows much better code to be generated for affected loops.
Co-authored-by: Julian Brown <jul...@codesourcery.com>
gcc/
* doc/tm.texi.in (TARGET_VECTORIZE_PREFER_GATHER_SCATTER): Add
documentation hook.
* doc/tm.texi: Regenerate.
* target.def (prefer_gather_scatter): Add target hook under vectorizer.
* tree-vect-stmts.cc (get_group_load_store_type): Optionally prefer
gather/scatter instructions to scalar/elementwise fallback.
* config/gcn/gcn.cc (gcn_prefer_gather_scatter): New function.
(TARGET_VECTORIZE_PREFER_GATHER_SCATTER): Define hook.
---
gcc/config/gcn/gcn.cc | 12 ++++++++++++
gcc/doc/tm.texi | 5 +++++
gcc/doc/tm.texi.in | 2 ++
gcc/target.def | 10 ++++++++++
gcc/tree-vect-stmts.cc | 6 ++++--
5 files changed, 33 insertions(+), 2 deletions(-)
diff --git a/gcc/config/gcn/gcn.cc b/gcc/config/gcn/gcn.cc
index 81871852148..6f2631c2138 100644
--- a/gcc/config/gcn/gcn.cc
+++ b/gcc/config/gcn/gcn.cc
@@ -5789,6 +5789,16 @@ gcn_libc_has_function (enum function_class fn_class,
return bsd_libc_has_function (fn_class, type);
}
+/* Implement TARGET_VECTORIZE_PREFER_GATHER_SCATTER. */
+
+static bool
+gcn_prefer_gather_scatter (machine_mode ARG_UNUSED (mode),
+ int ARG_UNUSED (scale),
+ unsigned int ARG_UNUSED (group_size))
+{
+ return true;
+}
+
/* }}} */
/* {{{ md_reorg pass. */
@@ -7985,6 +7995,8 @@ gcn_dwarf_register_span (rtx rtl)
gcn_vectorize_builtin_vectorized_function
#undef TARGET_VECTORIZE_GET_MASK_MODE
#define TARGET_VECTORIZE_GET_MASK_MODE gcn_vectorize_get_mask_mode
+#undef TARGET_VECTORIZE_PREFER_GATHER_SCATTER
+#define TARGET_VECTORIZE_PREFER_GATHER_SCATTER gcn_prefer_gather_scatter
#undef TARGET_VECTORIZE_PREFERRED_SIMD_MODE
#define TARGET_VECTORIZE_PREFERRED_SIMD_MODE gcn_vectorize_preferred_simd_mode
#undef TARGET_VECTORIZE_PREFERRED_VECTOR_ALIGNMENT
diff --git a/gcc/doc/tm.texi b/gcc/doc/tm.texi
index 5e305643b3a..3a98692b236 100644
--- a/gcc/doc/tm.texi
+++ b/gcc/doc/tm.texi
@@ -6511,6 +6511,11 @@ The default is @code{NULL_TREE} which means to not vectorize scatter
stores.
@end deftypefn
+@deftypefn {Target Hook} bool TARGET_VECTORIZE_PREFER_GATHER_SCATTER (machine_mode @var{mode}, int @var{scale}, unsigned int @var{group_size})
+This hook returns TRUE if gather loads or scatter stores are cheaper on
+this target than a sequence of elementwise loads or stores.
+@end deftypefn
+
@deftypefn {Target Hook} int TARGET_SIMD_CLONE_COMPUTE_VECSIZE_AND_SIMDLEN (struct cgraph_node *@var{}, struct cgraph_simd_clone *@var{}, @var{tree}, @var{int}, @var{bool})
This hook should set @var{vecsize_mangle}, @var{vecsize_int}, @var{vecsize_float}
fields in @var{simd_clone} structure pointed by @var{clone_info} argument and also
diff --git a/gcc/doc/tm.texi.in b/gcc/doc/tm.texi.in
index eccc4d88493..b03ad4c97c6 100644
--- a/gcc/doc/tm.texi.in
+++ b/gcc/doc/tm.texi.in
@@ -4311,6 +4311,8 @@ address; but often a machine-dependent strategy can generate better code.
@hook TARGET_VECTORIZE_BUILTIN_SCATTER
+@hook TARGET_VECTORIZE_PREFER_GATHER_SCATTER
+
@hook TARGET_SIMD_CLONE_COMPUTE_VECSIZE_AND_SIMDLEN
@hook TARGET_SIMD_CLONE_ADJUST
diff --git a/gcc/target.def b/gcc/target.def
index 38903eb567a..b654069d8f9 100644
--- a/gcc/target.def
+++ b/gcc/target.def
@@ -2056,6 +2056,16 @@ all zeros. GCC can then try to branch around the instruction instead.",
(unsigned ifn),
default_empty_mask_is_expensive)
+/* Prefer gather/scatter loads/stores to e.g. elementwise accesses if\n\
+we cannot use a contiguous access. */
+DEFHOOK
+(prefer_gather_scatter,
+ "This hook returns TRUE if gather loads or scatter stores are cheaper on\n\
+this target than a sequence of elementwise loads or stores. ",
+ bool,
+ (machine_mode mode, int scale, unsigned int group_size),
+ hook_bool_void_false)
+
/* Target builtin that implements vector gather operation. */
DEFHOOK
(builtin_gather,
diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc
index 2e9b3d2e686..e6dff106728 100644
--- a/gcc/tree-vect-stmts.cc
+++ b/gcc/tree-vect-stmts.cc
@@ -2349,11 +2349,13 @@ get_group_load_store_type (vec_info *vinfo, stmt_vec_info stmt_info,
allows us to use contiguous accesses. */
if ((*memory_access_type == VMAT_ELEMENTWISE
|| *memory_access_type == VMAT_STRIDED_SLP)
- && single_element_p
&& SLP_TREE_LANES (slp_node) == 1
&& loop_vinfo
&& vect_use_strided_gather_scatters_p (stmt_info, loop_vinfo,
- masked_p, gs_info, elsvals))
+ masked_p, gs_info, elsvals)
+ && (targetm.vectorize.prefer_gather_scatter (TYPE_MODE (vectype),
+ gs_info->scale, group_size)
+ || single_element_p))
*memory_access_type = VMAT_GATHER_SCATTER;
if (*memory_access_type == VMAT_CONTIGUOUS_DOWN
--
2.50.0