> So this isn't a regression, but I can also understand the desire to fix 
> this fairly significant performance issue.

I'd argue it is a regression as the match.pd pattern that merges the permutes
was introduces after GCC 14.

After giving it a bit more thought, I'd still like to send the attached v2
because it excludes fewer cases and, consequently, requires fewer changes to
the test suite.

Regtested on rv64gcv_zvl512b.

Regards
 Robin

[PATCH v2] RISC-V: Disable two-source permutes for now [PR117173].

After testing on the BPI (4.2% improvement for x264 input 1, 4.4% for
input 2) and the discussion in PR117173 I figured it's best to disable
the two-source permutes by default for now.

The patch adds a parameter "riscv-two-source-permutes" which restores
the old behavior.

        PR target/117173

gcc/ChangeLog:

        * config/riscv/riscv-v.cc (shuffle_generic_patterns): Only
        support single-source permutes by default.
        * config/riscv/riscv.opt: New param "riscv-two-source-permutes".

gcc/testsuite/ChangeLog:

        * gcc.dg/fold-perm-2.c: Run with two-source permutes.
        * gcc.dg/pr54346.c: Ditto.
---
 gcc/config/riscv/riscv-v.cc        | 13 ++++++++++++-
 gcc/config/riscv/riscv.opt         |  4 ++++
 gcc/testsuite/gcc.dg/fold-perm-2.c |  1 +
 gcc/testsuite/gcc.dg/pr54346.c     |  1 +
 4 files changed, 18 insertions(+), 1 deletion(-)

diff --git a/gcc/config/riscv/riscv-v.cc b/gcc/config/riscv/riscv-v.cc
index e1172e9c7d2..9847439ca77 100644
--- a/gcc/config/riscv/riscv-v.cc
+++ b/gcc/config/riscv/riscv-v.cc
@@ -3947,11 +3947,22 @@ shuffle_generic_patterns (struct expand_vec_perm_d *d)
   if (!get_gather_index_mode (d).exists (&sel_mode))
     return false;
 
+  rtx sel = vec_perm_indices_to_rtx (sel_mode, d->perm);
+  poly_uint64 nunits = GET_MODE_NUNITS (sel_mode);
+  rtx elt;
+
+  bool is_simple = d->one_vector_p
+    || const_vec_duplicate_p (sel, &elt)
+    || (nunits.is_constant ()
+       && const_vec_all_in_range_p (sel, 0, nunits - 1));
+
+  if (!is_simple && !riscv_two_source_permutes)
+    return false;
+
   /* Success! */
   if (d->testing_p)
     return true;
 
-  rtx sel = vec_perm_indices_to_rtx (sel_mode, d->perm);
   /* Some FIXED-VLMAX/VLS vector permutation situations call targethook
      instead of expand vec_perm<mode>, we handle it directly.  */
   expand_vec_perm (d->target, d->op0, d->op1, sel);
diff --git a/gcc/config/riscv/riscv.opt b/gcc/config/riscv/riscv.opt
index f51f8fd1cdf..ed0695e20d3 100644
--- a/gcc/config/riscv/riscv.opt
+++ b/gcc/config/riscv/riscv.opt
@@ -622,6 +622,10 @@ Enum(vsetvl_strategy) String(optim-no-fusion) 
Value(VSETVL_OPT_NO_FUSION)
 Target Undocumented RejectNegative Joined Enum(vsetvl_strategy) 
Var(vsetvl_strategy) Init(VSETVL_OPT)
 -param=vsetvl-strategy=<string>        Set the optimization level of VSETVL 
insert pass.
 
+-param=riscv-two-source-permutes
+Target Undocumented Uinteger Var(riscv_two_source_permutes) Init(0)
+-param=riscv-two-source-permutes Enable permutes/gathers with two sources 
vectors.
+
 Enum
 Name(stringop_strategy) Type(enum stringop_strategy_enum)
 Valid arguments to -mstringop-strategy=:
diff --git a/gcc/testsuite/gcc.dg/fold-perm-2.c 
b/gcc/testsuite/gcc.dg/fold-perm-2.c
index 1a4ab4065de..9fd809ee296 100644
--- a/gcc/testsuite/gcc.dg/fold-perm-2.c
+++ b/gcc/testsuite/gcc.dg/fold-perm-2.c
@@ -1,5 +1,6 @@
 /* { dg-do compile } */
 /* { dg-options "-O -fdump-tree-fre1" } */
+/* { dg-additional-options "--param=riscv-two-source-permutes" { target 
riscv*-*-* } } */
 
 typedef int veci __attribute__ ((vector_size (4 * sizeof (int))));
 typedef unsigned int vecu __attribute__ ((vector_size (4 * sizeof (unsigned 
int))));
diff --git a/gcc/testsuite/gcc.dg/pr54346.c b/gcc/testsuite/gcc.dg/pr54346.c
index 5ec0609f1e5..b78e0533ac2 100644
--- a/gcc/testsuite/gcc.dg/pr54346.c
+++ b/gcc/testsuite/gcc.dg/pr54346.c
@@ -1,5 +1,6 @@
 /* { dg-do compile } */
 /* { dg-options "-O -fdump-tree-dse1 -Wno-psabi" } */
+/* { dg-additional-options "--param=riscv-two-source-permutes" { target 
riscv*-*-* } } */
 
 typedef int veci __attribute__ ((vector_size (4 * sizeof (int))));

-- 

Reply via email to