On Tue, Apr 1, 2025 at 8:17 PM Richard Sandiford
<richard.sandif...@arm.com> wrote:
>
> This patch forestalls a regression in gcc.dg/rtl/x86_64/vector_eq.c
> with the patch for PR116398.  The test wants:
>
>       (cinsn 3 (set (reg:V4SI <0>) (const_vector:V4SI [(const_int 0) 
> (const_int 0) (const_int 0) (const_int 0)])))
>       (cinsn 5 (set (reg:V4SI <2>)
>                     (eq:V4SI (reg:V4SI <0>) (reg:V4SI <1>))))
>
> to be folded to a vector of -1s.  One unusual thing about the fold
> is that the <1> in the second insn is uninitialised; it looks like
> it should be replaced by <0>, or that there should be an insn 4 that
> copies <0> to <1>.
>
> As it stands, the test relies on init-regs to insert a zero
> initialisation of <1>.  This happens after all the cse/pre/fwprop
> stuff, with only dce passes between init-regs and combine.
> Combine therefore sees:
>
> (insn 3 2 8 2 (set (reg:V4SI 98)
>         (const_vector:V4SI [
>                 (const_int 0 [0]) repeated x4
>             ])) 2403 {movv4si_internal}
>      (nil))
> (insn 8 3 9 2 (clobber (reg:V4SI 99)) -1
>      (nil))
> (insn 9 8 5 2 (set (reg:V4SI 99)
>         (const_vector:V4SI [
>                 (const_int 0 [0]) repeated x4
>             ])) -1
>      (nil))
> (insn 5 9 7 2 (set (reg:V4SI 100)
>         (eq:V4SI (reg:V4SI 98)
>             (reg:V4SI 99))) 7874 {*sse2_eqv4si3}
>      (expr_list:REG_DEAD (reg:V4SI 99)
>         (expr_list:REG_DEAD (reg:V4SI 98)
>             (expr_list:REG_EQUAL (eq:V4SI (const_vector:V4SI [
>                             (const_int 0 [0]) repeated x4
>                         ])
>                     (reg:V4SI 99))
>                 (nil)))))
>
> It looks like the test should then pass through a 3, 9 -> 5 combination,
> so that we get an (eq ...) between two zeros and fold it to a vector
> of -1s.  But although the combination is attempted, the fold doesn't
> happen.  Instead, combine is left to match the unsimplified (eq ...)
> between two zeros, which rightly fails.  The test only passes because
> late_combine2 happens to try simplifying an (eq ...) between reg X and
> reg X, which does fold to a vector of -1s.
>
> The different handling of registers and constants is due to this
> code in simplify_const_relational_operation:
>
>   if (INTEGRAL_MODE_P (mode) && trueop1 != const0_rtx
>       && (code == EQ || code == NE)
>       && ! ((REG_P (op0) || CONST_INT_P (trueop0))
>             && (REG_P (op1) || CONST_INT_P (trueop1)))
>       && (tem = simplify_binary_operation (MINUS, mode, op0, op1)) != 0
>       /* We cannot do this if tem is a nonzero address.  */
>       && ! nonzero_address_p (tem))
>     return simplify_const_relational_operation (signed_condition (code),
>                                                 mode, tem, const0_rtx);
>
> INTEGRAL_MODE_P matches vector integer modes, but everything else
> about the condition is written for scalar integers only.  Thus if
> trueop0 and trueop1 are equal vector constants, we'll bypass all
> the exclusions and try simplifying a subtraction.  This will succeed,
> giving a vector of zeros.  The recursive call will then try to simplify
> a comparison between the vector of zeros and const0_rtx, which isn't
> well-formed.  Luckily or unluckily, the ill-formedness doesn't trigger
> an ICE, but it does prevent any simplification from happening.
>
> The least-effort fix would be to replace INTEGRAL_MODE_P with
> SCALAR_INT_MODE_P.  But the fold does make conceptual sense for
> vectors too, so it seemed better to keep the INTEGRAL_MODE_P and
> generalise the rest of the condition to match.
>
> Tested on aarch64-linux-gnu & x86_64-linux-gnu.  OK to install?
>
> I'm hoping to post the actual patch for PR116398 later today.
>
> Richard
>
>
> gcc/
>         * simplify-rtx.cc (simplify_const_relational_operation): Generalize
>         the constant checks in the fold-via-minus path to match the
>         INTEGRAL_MODE_P condition.
> ---
>  gcc/simplify-rtx.cc | 13 +++++++++----
>  1 file changed, 9 insertions(+), 4 deletions(-)
>
> diff --git a/gcc/simplify-rtx.cc b/gcc/simplify-rtx.cc
> index fe007bc7d96..6f969effdf9 100644
> --- a/gcc/simplify-rtx.cc
> +++ b/gcc/simplify-rtx.cc
> @@ -6657,15 +6657,20 @@ simplify_const_relational_operation (enum rtx_code 
> code,
>       we do not know the signedness of the operation on either the left or
>       the right hand side of the comparison.  */
>
> -  if (INTEGRAL_MODE_P (mode) && trueop1 != const0_rtx
> +  if (INTEGRAL_MODE_P (mode)
> +      && trueop1 != CONST0_RTX (mode)
>        && (code == EQ || code == NE)
> -      && ! ((REG_P (op0) || CONST_INT_P (trueop0))
> -           && (REG_P (op1) || CONST_INT_P (trueop1)))
> +      && ! ((REG_P (op0)
> +            || CONST_SCALAR_INT_P (trueop0)
> +            || CONST_VECTOR_P (trueop0))
> +           && (REG_P (op1)
> +               || CONST_SCALAR_INT_P (trueop1)
> +               || CONST_VECTOR_P (trueop1)))
>        && (tem = simplify_binary_operation (MINUS, mode, op0, op1)) != 0
>        /* We cannot do this if tem is a nonzero address.  */
>        && ! nonzero_address_p (tem))
>      return simplify_const_relational_operation (signed_condition (code),
> -                                               mode, tem, const0_rtx);
> +                                               mode, tem, CONST0_RTX (mode));
>
>    if (! HONOR_NANS (mode) && code == ORDERED)
>      return const_true_rtx;
> --
> 2.25.1
>

This patch also fixed:

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=117863

I am checking in this patch to add 2 tests.

-- 
H.J.
From a9fc1b9dec92842b3a978183388c1833918776fd Mon Sep 17 00:00:00 2001
From: "H.J. Lu" <hjl.to...@gmail.com>
Date: Sun, 20 Apr 2025 15:09:00 +0800
Subject: [PATCH] x86: Add tests for PR target/117863

commit 546f28f83ceba74dc8bf84b0435c0159ffca971a
Author: Richard Sandiford <richard.sandif...@arm.com>
Date:   Mon Apr 7 08:03:46 2025 +0100

    simplify-rtx: Fix shortcut for vector eq/ne

fixed

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=117863

	PR target/117863
	* gcc.dg/rtl/i386/vector_eq-2.c: New test.
	* gcc.dg/rtl/i386/vector_eq-3.c: Likewise.

Signed-off-by: H.J. Lu <hjl.to...@gmail.com>
---
 gcc/testsuite/gcc.dg/rtl/i386/vector_eq-2.c | 71 ++++++++++++++++++++
 gcc/testsuite/gcc.dg/rtl/i386/vector_eq-3.c | 74 +++++++++++++++++++++
 2 files changed, 145 insertions(+)
 create mode 100644 gcc/testsuite/gcc.dg/rtl/i386/vector_eq-2.c
 create mode 100644 gcc/testsuite/gcc.dg/rtl/i386/vector_eq-3.c

diff --git a/gcc/testsuite/gcc.dg/rtl/i386/vector_eq-2.c b/gcc/testsuite/gcc.dg/rtl/i386/vector_eq-2.c
new file mode 100644
index 00000000000..871d489b730
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/rtl/i386/vector_eq-2.c
@@ -0,0 +1,71 @@
+/* { dg-do compile { target { i?86-*-* x86_64-*-* } } } */
+/* { dg-additional-options "-O2 -march=x86-64-v3" } */
+
+typedef int v4si __attribute__((vector_size(16)));
+typedef int v8si __attribute__((vector_size(32)));
+typedef int v2di __attribute__((vector_size(16)));
+
+v4si __RTL (startwith ("vregs1")) foo1 (void)
+{
+(function "foo1"
+  (insn-chain
+    (block 2
+      (edge-from entry (flags "FALLTHRU"))
+      (cnote 1 [bb 2] NOTE_INSN_BASIC_BLOCK)
+      (cnote 2 NOTE_INSN_FUNCTION_BEG)
+      (cinsn 3 (set (reg:V4SI <0>) (const_vector:V4SI [(const_int -1) (const_int -1) (const_int -1) (const_int -1)])))
+      (cinsn 4 (set (reg:V4SI <1>) (const_vector:V4SI [(const_int -1) (const_int -1) (const_int -1) (const_int -1)])))
+      (cinsn 5 (set (reg:V4SI <2>)
+		    (eq:V4SI (reg:V4SI <0>) (reg:V4SI <1>))))
+      (cinsn 6 (set (reg:V4SI <3>) (reg:V4SI <2>)))
+      (cinsn 7 (set (reg:V4SI xmm0) (reg:V4SI <3>)))
+      (edge-to exit (flags "FALLTHRU"))
+    )
+  )
+ (crtl (return_rtx (reg/i:V4SI xmm0)))
+)
+}
+
+v8si __RTL (startwith ("vregs1")) foo2 (void)
+{
+(function "foo2"
+  (insn-chain
+    (block 2
+      (edge-from entry (flags "FALLTHRU"))
+      (cnote 1 [bb 2] NOTE_INSN_BASIC_BLOCK)
+      (cnote 2 NOTE_INSN_FUNCTION_BEG)
+      (cinsn 3 (set (reg:V8SI <0>) (const_vector:V8SI [(const_int -1) (const_int -1) (const_int -1) (const_int -1) (const_int -1) (const_int -1) (const_int -1) (const_int -1)])))
+      (cinsn 4 (set (reg:V8SI <1>) (const_vector:V8SI [(const_int -1) (const_int -1) (const_int -1) (const_int -1) (const_int -1) (const_int -1) (const_int -1) (const_int -1)])))
+      (cinsn 5 (set (reg:V8SI <2>)
+		    (eq:V8SI (reg:V8SI <0>) (reg:V8SI <1>))))
+      (cinsn 6 (set (reg:V8SI <3>) (reg:V8SI <2>)))
+      (cinsn 7 (set (reg:V8SI xmm0) (reg:V8SI <3>)))
+      (edge-to exit (flags "FALLTHRU"))
+    )
+  )
+ (crtl (return_rtx (reg/i:V8SI xmm0)))
+)
+}
+
+v2di __RTL (startwith ("vregs1")) foo3 (void)
+{
+(function "foo3"
+  (insn-chain
+    (block 2
+      (edge-from entry (flags "FALLTHRU"))
+      (cnote 1 [bb 2] NOTE_INSN_BASIC_BLOCK)
+      (cnote 2 NOTE_INSN_FUNCTION_BEG)
+      (cinsn 3 (set (reg:V2DI <0>) (const_vector:V2DI [(const_int -1) (const_int -1)])))
+      (cinsn 4 (set (reg:V2DI <1>) (const_vector:V2DI [(const_int -1) (const_int -1)])))
+      (cinsn 5 (set (reg:V2DI <2>)
+		    (eq:V2DI (reg:V2DI <0>) (reg:V2DI <1>))))
+      (cinsn 6 (set (reg:V2DI <3>) (reg:V2DI <2>)))
+      (cinsn 7 (set (reg:V2DI xmm0) (reg:V2DI <3>)))
+      (edge-to exit (flags "FALLTHRU"))
+    )
+  )
+ (crtl (return_rtx (reg/i:V2DI xmm0)))
+)
+}
+
+/* { dg-final { scan-assembler-times "vpcmpeq" 3 } } */
diff --git a/gcc/testsuite/gcc.dg/rtl/i386/vector_eq-3.c b/gcc/testsuite/gcc.dg/rtl/i386/vector_eq-3.c
new file mode 100644
index 00000000000..276c4c2cf4c
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/rtl/i386/vector_eq-3.c
@@ -0,0 +1,74 @@
+/* { dg-do compile { target { i?86-*-* x86_64-*-* } } } */
+/* { dg-additional-options "-O2 -march=x86-64-v3" } */
+
+typedef int v4si __attribute__((vector_size(16)));
+typedef int v8si __attribute__((vector_size(32)));
+typedef int v2di __attribute__((vector_size(16)));
+
+v4si __RTL (startwith ("vregs1")) foo1 (void)
+{
+(function "foo1"
+  (insn-chain
+    (block 2
+      (edge-from entry (flags "FALLTHRU"))
+      (cnote 1 [bb 2] NOTE_INSN_BASIC_BLOCK)
+      (cnote 2 NOTE_INSN_FUNCTION_BEG)
+      (cinsn 3 (set (reg:V4SI <1>)
+		    (mem:V4SI (reg:SI di) [0 ptr S128 A128])))
+      (cinsn 4 (set (reg:V4SI <2>)
+		    (eq:V4SI (reg:V4SI <1>)
+			     (mem:V4SI (reg:SI di) [0 ptr S128 A128]))))
+      (cinsn 5 (set (reg:V4SI <3>) (reg:V4SI <2>)))
+      (cinsn 6 (set (reg:V4SI xmm0) (reg:V4SI <3>)))
+      (edge-to exit (flags "FALLTHRU"))
+    )
+  )
+ (crtl (return_rtx (reg/i:V4SI xmm0)))
+)
+}
+
+v8si __RTL (startwith ("vregs1")) foo2 (void)
+{
+(function "foo2"
+  (insn-chain
+    (block 2
+      (edge-from entry (flags "FALLTHRU"))
+      (cnote 1 [bb 2] NOTE_INSN_BASIC_BLOCK)
+      (cnote 2 NOTE_INSN_FUNCTION_BEG)
+      (cinsn 3 (set (reg:V8SI <1>)
+		    (mem:V8SI (reg:SI di) [0 ptr S256 A256])))
+      (cinsn 4 (set (reg:V8SI <2>)
+		    (eq:V8SI (mem:V8SI (reg:SI di) [0 ptr S256 A256])
+			     (reg:V8SI <1>))))
+      (cinsn 5 (set (reg:V8SI <3>) (reg:V8SI <2>)))
+      (cinsn 6 (set (reg:V8SI xmm0) (reg:V8SI <3>)))
+      (edge-to exit (flags "FALLTHRU"))
+    )
+  )
+ (crtl (return_rtx (reg/i:V8SI xmm0)))
+)
+}
+
+v2di __RTL (startwith ("vregs1")) foo3 (void)
+{
+(function "foo3"
+  (insn-chain
+    (block 2
+      (edge-from entry (flags "FALLTHRU"))
+      (cnote 1 [bb 2] NOTE_INSN_BASIC_BLOCK)
+      (cnote 2 NOTE_INSN_FUNCTION_BEG)
+      (cinsn 3 (set (reg:V2DI <1>)
+		    (mem:V2DI (reg:SI di) [0 ptr S128 A128])))
+      (cinsn 4 (set (reg:V2DI <2>)
+		    (eq:V2DI (reg:V2DI <1>)
+			     (mem:V2DI (reg:SI di) [0 ptr S128 A128]))))
+      (cinsn 5 (set (reg:V2DI <3>) (reg:V2DI <2>)))
+      (cinsn 6 (set (reg:V2DI xmm0) (reg:V2DI <3>)))
+      (edge-to exit (flags "FALLTHRU"))
+    )
+  )
+ (crtl (return_rtx (reg/i:V2DI xmm0)))
+)
+}
+
+/* { dg-final { scan-assembler-times "vpcmpeq" 3 } } */
-- 
2.49.0

Reply via email to