On Tue, 2025-01-21 at 23:18 +0800, Xi Ruoyao wrote:
> On Tue, 2025-01-21 at 22:14 +0800, Xi Ruoyao wrote:
> > > > in GCC 13 the result is:
> > > > 
> > > >         or      $r12,$r4,$r0
> > > 
> > > Hmm, this strange move is caused by "&" in bstrpick_alsl_paired. 
> > > Is it
> > > really needed for the fusion?
> > 
> > Never mind, it's needed or a = ((a & 0xffffffff) << 1) + a will blow
> > up.
> > Stupid I.
> 
> And my code is indeed broken due to the missing '&':
> 
> /* { dg-do run } */
> /* { dg-options "-O2" } */
> 
> register long x asm ("s0");
> 
> #define TEST(x) (int)(((x & 0x114) << 3) + x)
> 
> [[gnu::noipa]] void
> test (void)
> {
>   x = TEST (x);
> }
> 
> int
> main (void)
> {
>   x = 0xffff;
>   test ();
>   if (x != TEST (0xffff))
>     __builtin_trap ();
> }
> 
> ends up:
> 
> 0000000000000760 <test>:
>  760: 034452f7        andi            $s0, $s0, 0x114
>  764: 00055ef7        alsl.w          $s0, $s0, $s0, 0x3
>  768: 4c000020        ret
> 
> and fails.  The fix would be like https://gcc.gnu.org/r15-5074.

Now bootstrapping & testing two patches attached here instead.  They
should fix the wrong-code and miss-optimization regressions, except the
instruction ordering which requires TARGET_SCHED_MACRO_FUSION_PAIR_P.

> > > >         bstrpick.d      $r4,$r12,31,0
> > > >         alsl.d  $r4,$r4,$r6,2
> > > >         or      $r12,$r5,$r0
> > > >         bstrpick.d      $r5,$r12,31,0
> > > >         alsl.d  $r5,$r5,$r6,2
> > > >         jr      $r1

-- 
Xi Ruoyao <xry...@xry111.site>
School of Aerospace Science and Technology, Xidian University
From 09a4f641331709685b6b5fbcb07d2a26b42a5576 Mon Sep 17 00:00:00 2001
From: Xi Ruoyao <xry...@xry111.site>
Date: Tue, 21 Jan 2025 23:01:38 +0800
Subject: [PATCH 1/2] LoongArch: Fix wrong code with
 <optab>_alsl_reversesi_extended

The second source register of this insn cannot be the same as the
destination register.

gcc/ChangeLog:

	* config/loongarch/loongarch.md
	(<optab>_alsl_reversesi_extended): Add '&' to the destination
	register constraint and append '0' to the first source register
	constraint to indicate the destination register cannot be same
	as the second source register, and change the split condition to
	reload_completed so that the insn will be split only after RA in
	order to obtain allocated registers that satisfy the above
	constraints.

gcc/testsuite/ChangeLog:

	* gcc.target/loongarch/bitwise-shift-reassoc-clobber.c: New
	test.
---
 gcc/config/loongarch/loongarch.md             |  6 +++---
 .../loongarch/bitwise-shift-reassoc-clobber.c | 21 +++++++++++++++++++
 2 files changed, 24 insertions(+), 3 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/loongarch/bitwise-shift-reassoc-clobber.c

diff --git a/gcc/config/loongarch/loongarch.md b/gcc/config/loongarch/loongarch.md
index 223e2b9f37f..1392325038c 100644
--- a/gcc/config/loongarch/loongarch.md
+++ b/gcc/config/loongarch/loongarch.md
@@ -3160,13 +3160,13 @@ (define_insn_and_split "<optab>_shift_reverse<X:mode>"
 ;; add.w => alsl.w, so implement slli.d + and + add.w => and + alsl.w on
 ;; our own.
 (define_insn_and_split "<optab>_alsl_reversesi_extended"
-  [(set (match_operand:DI 0 "register_operand" "=r")
+  [(set (match_operand:DI 0 "register_operand" "=&r")
 	(sign_extend:DI
 	  (plus:SI
 	    (subreg:SI
 	      (any_bitwise:DI
 		(ashift:DI
-		  (match_operand:DI 1 "register_operand" "r")
+		  (match_operand:DI 1 "register_operand" "r0")
 		  (match_operand:SI 2 "const_immalsl_operand" ""))
 		(match_operand:DI 3 "const_int_operand" "i"))
 	      0)
@@ -3175,7 +3175,7 @@ (define_insn_and_split "<optab>_alsl_reversesi_extended"
    && loongarch_reassoc_shift_bitwise (<is_and>, operands[2], operands[3],
 				       SImode)"
   "#"
-  "&& true"
+  "&& reload_completed"
   [; r0 = r1 [&|^] r3 is emitted in PREPARATION-STATEMENTS because we
    ; need to handle a special case, see below.
    (set (match_dup 0)
diff --git a/gcc/testsuite/gcc.target/loongarch/bitwise-shift-reassoc-clobber.c b/gcc/testsuite/gcc.target/loongarch/bitwise-shift-reassoc-clobber.c
new file mode 100644
index 00000000000..9985a18ea08
--- /dev/null
+++ b/gcc/testsuite/gcc.target/loongarch/bitwise-shift-reassoc-clobber.c
@@ -0,0 +1,21 @@
+/* { dg-do run } */
+/* { dg-options "-O2" } */
+
+register long x asm ("s0");
+
+#define TEST(x) (int)(((x & 0x114) << 3) + x)
+
+[[gnu::noipa]] void
+test (void)
+{
+  x = TEST (x);
+}
+
+int
+main (void)
+{
+  x = 0xffff;
+  test ();
+  if (x != TEST (0xffff))
+    __builtin_trap ();
+}
-- 
2.48.1

From 5c0b402020867110ba5c33760f961733fddfee01 Mon Sep 17 00:00:00 2001
From: Xi Ruoyao <xry...@xry111.site>
Date: Tue, 21 Jan 2025 23:36:25 +0800
Subject: [PATCH 2/2] LoongArch: Partially fix code regression from r15-7062

The uarch can fuse bstrpick.d rd,rs1,31,0 and alsl.d rd,rd,rs2,shamt,
so for this special case we should use alsl.d instead of slli.d.  And
I'd hoped late combine to handle slli.d + and + add.d => and + slli.d +
add.d => and + alsl.d, but it does not always work (even before the
alsl.d special case gets in the way).  So let's handle this on our own.

The fix is partial: to make the macro-fusion really work we need to
implement TARGET_SCHED_MACRO_FUSION_PAIR_P for it.  Thus the
bitwise-shift-reassoc-fuse.c case now xfail.

gcc/ChangeLog:

	* config/loongarch/loongarch.md (<optab>_shift_reverse<X:mode>):
	Emit alsl.d instead of slli.d if new mask is 0xffffffff and
	shamt is const_immalsl_operand.
	(<optab>_alsl_reverse<X:mode>): New define_insn_and_split.

gcc/testsuite/ChangeLog:

	* gcc.target/loongarch/bitwise-shift-reassoc-dual.c: New test.
	* gcc.target/loongarch/bitwise-shift-reassoc-fuse.c: New test.
---
 gcc/config/loongarch/loongarch.md             | 55 +++++++++++++++++--
 .../loongarch/bitwise-shift-reassoc-dual.c    | 18 ++++++
 .../loongarch/bitwise-shift-reassoc-fuse.c    | 16 ++++++
 3 files changed, 83 insertions(+), 6 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/loongarch/bitwise-shift-reassoc-dual.c
 create mode 100644 gcc/testsuite/gcc.target/loongarch/bitwise-shift-reassoc-fuse.c

diff --git a/gcc/config/loongarch/loongarch.md b/gcc/config/loongarch/loongarch.md
index 1392325038c..2728d5e4d1c 100644
--- a/gcc/config/loongarch/loongarch.md
+++ b/gcc/config/loongarch/loongarch.md
@@ -3138,8 +3138,54 @@ (define_insn_and_split "<optab>_shift_reverse<X:mode>"
 				       <MODE>mode)"
   "#"
   "&& true"
+  [(set (match_dup 0) (ashift:X (match_dup 0) (match_dup 2)))]
+  {
+    operands[3] = loongarch_reassoc_shift_bitwise (<is_and>,
+						   operands[2],
+						   operands[3],
+						   <MODE>mode);
+
+    if (ins_zero_bitmask_operand (operands[3], <MODE>mode))
+      {
+	gcc_checking_assert (<is_and>);
+	emit_move_insn (operands[0], operands[1]);
+	operands[1] = operands[0];
+      }
+
+    emit_insn (gen_<optab><mode>3 (operands[0], operands[1], operands[3]));
+
+    if (<is_and>
+	&& TARGET_64BIT
+	&& si_mask_operand (operands[3], DImode)
+	&& const_immalsl_operand (operands[2], SImode))
+      {
+	/* Special case for bstrpick.d + alsl.d fusion
+	   TODO: TARGET_SCHED_MACRO_FUSION_PAIR_P */
+	emit_insn (gen_alsldi3 (operands[0], operands[0],
+				operands[2], gen_rtx_REG (DImode, 0)));
+	DONE;
+      }
+  })
+
+;; The late_combine2 pass can handle slli.d + add.d => alsl.d, but it seems
+;; not covering all cases, and obviously it's broken by the special case
+;; using alsl.d rd,rd,r0,shamt instead of slli.d rd,rd,shamt.  So
+;; implement slli.d + and + add.d => and + alsl.d on our own.
+(define_insn_and_split "<optab>_alsl_reverse<X:mode>"
+  [(set (match_operand:X 0 "register_operand" "=&r")
+	(plus:X
+	  (any_bitwise:X
+	    (ashift:X (match_operand:X  1 "register_operand" "r0")
+		      (match_operand:SI 2 "const_immalsl_operand" "i"))
+	    (match_operand:X 3 "const_int_operand" "i"))
+	  (match_operand:X 4 "register_operand" "r")))]
+  "loongarch_reassoc_shift_bitwise (<is_and>, operands[2], operands[3],
+				    <MODE>mode)"
+  "#"
+  "&& reload_completed"
   [(set (match_dup 0) (any_bitwise:X (match_dup 1) (match_dup 3)))
-   (set (match_dup 0) (ashift:X (match_dup 0) (match_dup 2)))]
+   (set (match_dup 0) (plus:X (ashift:X (match_dup 0) (match_dup 2))
+			      (match_dup 4)))]
   {
     operands[3] = loongarch_reassoc_shift_bitwise (<is_and>,
 						   operands[2],
@@ -3154,11 +3200,8 @@ (define_insn_and_split "<optab>_shift_reverse<X:mode>"
       }
   })
 
-;; The late_combine2 pass can handle slli.d + add.d => alsl.d, so we
-;; already have slli.d + any_bitwise + add.d => any_bitwise + slli.d +
-;; add.d => any_bitwise + alsl.d.  But late_combine2 cannot handle slli.d +
-;; add.w => alsl.w, so implement slli.d + and + add.w => and + alsl.w on
-;; our own.
+;; Likewise for slli.d + and + add.w => and + alsl.w, note that late
+;; combine cannot help this at all.
 (define_insn_and_split "<optab>_alsl_reversesi_extended"
   [(set (match_operand:DI 0 "register_operand" "=&r")
 	(sign_extend:DI
diff --git a/gcc/testsuite/gcc.target/loongarch/bitwise-shift-reassoc-dual.c b/gcc/testsuite/gcc.target/loongarch/bitwise-shift-reassoc-dual.c
new file mode 100644
index 00000000000..ad66daf6caa
--- /dev/null
+++ b/gcc/testsuite/gcc.target/loongarch/bitwise-shift-reassoc-dual.c
@@ -0,0 +1,18 @@
+/* { dg-do compile } */
+/* { dg-options "-O2" } */
+/* { dg-final { scan-assembler-times "bstrpick\\.\[wd\]" 2 } } */
+/* { dg-final { scan-assembler-times "alsl\\.\[wd\]" 2 } } */
+
+struct Pair { unsigned long a, b; };
+
+struct Pair
+test (struct Pair p, unsigned long x)
+{
+  p.a &= 0xfffffff;
+  p.a <<= 2;
+  p.a += x;
+  p.b &= 0xfffffff;
+  p.b <<= 2;
+  p.b += x;
+  return p;
+}
diff --git a/gcc/testsuite/gcc.target/loongarch/bitwise-shift-reassoc-fuse.c b/gcc/testsuite/gcc.target/loongarch/bitwise-shift-reassoc-fuse.c
new file mode 100644
index 00000000000..a75c2f0ce12
--- /dev/null
+++ b/gcc/testsuite/gcc.target/loongarch/bitwise-shift-reassoc-fuse.c
@@ -0,0 +1,16 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -march=loongarch64 -mabi=lp64d" } */
+/* { dg-final { scan-assembler-times "alsl.d" 2 } } */
+/* { dg-final { scan-assembler-times "bstrpick\\.d\[^\n\]*\n\talsl.d" 2 { xfail *-*-* } } } */
+
+struct Pair { unsigned long a, b; };
+
+struct Pair
+test (struct Pair p)
+{
+  p.a &= 0xffffffff;
+  p.a <<= 2;
+  p.b &= 0xffffffff;
+  p.b <<= 2;
+  return p;
+}
-- 
2.48.1

Reply via email to