On Tue, Apr 1, 2025 at 8:17 PM Richard Sandiford <richard.sandif...@arm.com> wrote: > > This patch forestalls a regression in gcc.dg/rtl/x86_64/vector_eq.c > with the patch for PR116398. The test wants: > > (cinsn 3 (set (reg:V4SI <0>) (const_vector:V4SI [(const_int 0) > (const_int 0) (const_int 0) (const_int 0)]))) > (cinsn 5 (set (reg:V4SI <2>) > (eq:V4SI (reg:V4SI <0>) (reg:V4SI <1>)))) > > to be folded to a vector of -1s. One unusual thing about the fold > is that the <1> in the second insn is uninitialised; it looks like > it should be replaced by <0>, or that there should be an insn 4 that > copies <0> to <1>. > > As it stands, the test relies on init-regs to insert a zero > initialisation of <1>. This happens after all the cse/pre/fwprop > stuff, with only dce passes between init-regs and combine. > Combine therefore sees: > > (insn 3 2 8 2 (set (reg:V4SI 98) > (const_vector:V4SI [ > (const_int 0 [0]) repeated x4 > ])) 2403 {movv4si_internal} > (nil)) > (insn 8 3 9 2 (clobber (reg:V4SI 99)) -1 > (nil)) > (insn 9 8 5 2 (set (reg:V4SI 99) > (const_vector:V4SI [ > (const_int 0 [0]) repeated x4 > ])) -1 > (nil)) > (insn 5 9 7 2 (set (reg:V4SI 100) > (eq:V4SI (reg:V4SI 98) > (reg:V4SI 99))) 7874 {*sse2_eqv4si3} > (expr_list:REG_DEAD (reg:V4SI 99) > (expr_list:REG_DEAD (reg:V4SI 98) > (expr_list:REG_EQUAL (eq:V4SI (const_vector:V4SI [ > (const_int 0 [0]) repeated x4 > ]) > (reg:V4SI 99)) > (nil))))) > > It looks like the test should then pass through a 3, 9 -> 5 combination, > so that we get an (eq ...) between two zeros and fold it to a vector > of -1s. But although the combination is attempted, the fold doesn't > happen. Instead, combine is left to match the unsimplified (eq ...) > between two zeros, which rightly fails. The test only passes because > late_combine2 happens to try simplifying an (eq ...) between reg X and > reg X, which does fold to a vector of -1s. > > The different handling of registers and constants is due to this > code in simplify_const_relational_operation: > > if (INTEGRAL_MODE_P (mode) && trueop1 != const0_rtx > && (code == EQ || code == NE) > && ! ((REG_P (op0) || CONST_INT_P (trueop0)) > && (REG_P (op1) || CONST_INT_P (trueop1))) > && (tem = simplify_binary_operation (MINUS, mode, op0, op1)) != 0 > /* We cannot do this if tem is a nonzero address. */ > && ! nonzero_address_p (tem)) > return simplify_const_relational_operation (signed_condition (code), > mode, tem, const0_rtx); > > INTEGRAL_MODE_P matches vector integer modes, but everything else > about the condition is written for scalar integers only. Thus if > trueop0 and trueop1 are equal vector constants, we'll bypass all > the exclusions and try simplifying a subtraction. This will succeed, > giving a vector of zeros. The recursive call will then try to simplify > a comparison between the vector of zeros and const0_rtx, which isn't > well-formed. Luckily or unluckily, the ill-formedness doesn't trigger > an ICE, but it does prevent any simplification from happening. > > The least-effort fix would be to replace INTEGRAL_MODE_P with > SCALAR_INT_MODE_P. But the fold does make conceptual sense for > vectors too, so it seemed better to keep the INTEGRAL_MODE_P and > generalise the rest of the condition to match. > > Tested on aarch64-linux-gnu & x86_64-linux-gnu. OK to install? > > I'm hoping to post the actual patch for PR116398 later today. > > Richard > > > gcc/ > * simplify-rtx.cc (simplify_const_relational_operation): Generalize > the constant checks in the fold-via-minus path to match the > INTEGRAL_MODE_P condition. > --- > gcc/simplify-rtx.cc | 13 +++++++++---- > 1 file changed, 9 insertions(+), 4 deletions(-) > > diff --git a/gcc/simplify-rtx.cc b/gcc/simplify-rtx.cc > index fe007bc7d96..6f969effdf9 100644 > --- a/gcc/simplify-rtx.cc > +++ b/gcc/simplify-rtx.cc > @@ -6657,15 +6657,20 @@ simplify_const_relational_operation (enum rtx_code > code, > we do not know the signedness of the operation on either the left or > the right hand side of the comparison. */ > > - if (INTEGRAL_MODE_P (mode) && trueop1 != const0_rtx > + if (INTEGRAL_MODE_P (mode) > + && trueop1 != CONST0_RTX (mode) > && (code == EQ || code == NE) > - && ! ((REG_P (op0) || CONST_INT_P (trueop0)) > - && (REG_P (op1) || CONST_INT_P (trueop1))) > + && ! ((REG_P (op0) > + || CONST_SCALAR_INT_P (trueop0) > + || CONST_VECTOR_P (trueop0)) > + && (REG_P (op1) > + || CONST_SCALAR_INT_P (trueop1) > + || CONST_VECTOR_P (trueop1))) > && (tem = simplify_binary_operation (MINUS, mode, op0, op1)) != 0 > /* We cannot do this if tem is a nonzero address. */ > && ! nonzero_address_p (tem)) > return simplify_const_relational_operation (signed_condition (code), > - mode, tem, const0_rtx); > + mode, tem, CONST0_RTX (mode)); > > if (! HONOR_NANS (mode) && code == ORDERED) > return const_true_rtx; > -- > 2.25.1 >
This patch also fixed: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=117863 I am checking in this patch to add 2 tests. -- H.J.
From a9fc1b9dec92842b3a978183388c1833918776fd Mon Sep 17 00:00:00 2001 From: "H.J. Lu" <hjl.to...@gmail.com> Date: Sun, 20 Apr 2025 15:09:00 +0800 Subject: [PATCH] x86: Add tests for PR target/117863 commit 546f28f83ceba74dc8bf84b0435c0159ffca971a Author: Richard Sandiford <richard.sandif...@arm.com> Date: Mon Apr 7 08:03:46 2025 +0100 simplify-rtx: Fix shortcut for vector eq/ne fixed https://gcc.gnu.org/bugzilla/show_bug.cgi?id=117863 PR target/117863 * gcc.dg/rtl/i386/vector_eq-2.c: New test. * gcc.dg/rtl/i386/vector_eq-3.c: Likewise. Signed-off-by: H.J. Lu <hjl.to...@gmail.com> --- gcc/testsuite/gcc.dg/rtl/i386/vector_eq-2.c | 71 ++++++++++++++++++++ gcc/testsuite/gcc.dg/rtl/i386/vector_eq-3.c | 74 +++++++++++++++++++++ 2 files changed, 145 insertions(+) create mode 100644 gcc/testsuite/gcc.dg/rtl/i386/vector_eq-2.c create mode 100644 gcc/testsuite/gcc.dg/rtl/i386/vector_eq-3.c diff --git a/gcc/testsuite/gcc.dg/rtl/i386/vector_eq-2.c b/gcc/testsuite/gcc.dg/rtl/i386/vector_eq-2.c new file mode 100644 index 00000000000..871d489b730 --- /dev/null +++ b/gcc/testsuite/gcc.dg/rtl/i386/vector_eq-2.c @@ -0,0 +1,71 @@ +/* { dg-do compile { target { i?86-*-* x86_64-*-* } } } */ +/* { dg-additional-options "-O2 -march=x86-64-v3" } */ + +typedef int v4si __attribute__((vector_size(16))); +typedef int v8si __attribute__((vector_size(32))); +typedef int v2di __attribute__((vector_size(16))); + +v4si __RTL (startwith ("vregs1")) foo1 (void) +{ +(function "foo1" + (insn-chain + (block 2 + (edge-from entry (flags "FALLTHRU")) + (cnote 1 [bb 2] NOTE_INSN_BASIC_BLOCK) + (cnote 2 NOTE_INSN_FUNCTION_BEG) + (cinsn 3 (set (reg:V4SI <0>) (const_vector:V4SI [(const_int -1) (const_int -1) (const_int -1) (const_int -1)]))) + (cinsn 4 (set (reg:V4SI <1>) (const_vector:V4SI [(const_int -1) (const_int -1) (const_int -1) (const_int -1)]))) + (cinsn 5 (set (reg:V4SI <2>) + (eq:V4SI (reg:V4SI <0>) (reg:V4SI <1>)))) + (cinsn 6 (set (reg:V4SI <3>) (reg:V4SI <2>))) + (cinsn 7 (set (reg:V4SI xmm0) (reg:V4SI <3>))) + (edge-to exit (flags "FALLTHRU")) + ) + ) + (crtl (return_rtx (reg/i:V4SI xmm0))) +) +} + +v8si __RTL (startwith ("vregs1")) foo2 (void) +{ +(function "foo2" + (insn-chain + (block 2 + (edge-from entry (flags "FALLTHRU")) + (cnote 1 [bb 2] NOTE_INSN_BASIC_BLOCK) + (cnote 2 NOTE_INSN_FUNCTION_BEG) + (cinsn 3 (set (reg:V8SI <0>) (const_vector:V8SI [(const_int -1) (const_int -1) (const_int -1) (const_int -1) (const_int -1) (const_int -1) (const_int -1) (const_int -1)]))) + (cinsn 4 (set (reg:V8SI <1>) (const_vector:V8SI [(const_int -1) (const_int -1) (const_int -1) (const_int -1) (const_int -1) (const_int -1) (const_int -1) (const_int -1)]))) + (cinsn 5 (set (reg:V8SI <2>) + (eq:V8SI (reg:V8SI <0>) (reg:V8SI <1>)))) + (cinsn 6 (set (reg:V8SI <3>) (reg:V8SI <2>))) + (cinsn 7 (set (reg:V8SI xmm0) (reg:V8SI <3>))) + (edge-to exit (flags "FALLTHRU")) + ) + ) + (crtl (return_rtx (reg/i:V8SI xmm0))) +) +} + +v2di __RTL (startwith ("vregs1")) foo3 (void) +{ +(function "foo3" + (insn-chain + (block 2 + (edge-from entry (flags "FALLTHRU")) + (cnote 1 [bb 2] NOTE_INSN_BASIC_BLOCK) + (cnote 2 NOTE_INSN_FUNCTION_BEG) + (cinsn 3 (set (reg:V2DI <0>) (const_vector:V2DI [(const_int -1) (const_int -1)]))) + (cinsn 4 (set (reg:V2DI <1>) (const_vector:V2DI [(const_int -1) (const_int -1)]))) + (cinsn 5 (set (reg:V2DI <2>) + (eq:V2DI (reg:V2DI <0>) (reg:V2DI <1>)))) + (cinsn 6 (set (reg:V2DI <3>) (reg:V2DI <2>))) + (cinsn 7 (set (reg:V2DI xmm0) (reg:V2DI <3>))) + (edge-to exit (flags "FALLTHRU")) + ) + ) + (crtl (return_rtx (reg/i:V2DI xmm0))) +) +} + +/* { dg-final { scan-assembler-times "vpcmpeq" 3 } } */ diff --git a/gcc/testsuite/gcc.dg/rtl/i386/vector_eq-3.c b/gcc/testsuite/gcc.dg/rtl/i386/vector_eq-3.c new file mode 100644 index 00000000000..276c4c2cf4c --- /dev/null +++ b/gcc/testsuite/gcc.dg/rtl/i386/vector_eq-3.c @@ -0,0 +1,74 @@ +/* { dg-do compile { target { i?86-*-* x86_64-*-* } } } */ +/* { dg-additional-options "-O2 -march=x86-64-v3" } */ + +typedef int v4si __attribute__((vector_size(16))); +typedef int v8si __attribute__((vector_size(32))); +typedef int v2di __attribute__((vector_size(16))); + +v4si __RTL (startwith ("vregs1")) foo1 (void) +{ +(function "foo1" + (insn-chain + (block 2 + (edge-from entry (flags "FALLTHRU")) + (cnote 1 [bb 2] NOTE_INSN_BASIC_BLOCK) + (cnote 2 NOTE_INSN_FUNCTION_BEG) + (cinsn 3 (set (reg:V4SI <1>) + (mem:V4SI (reg:SI di) [0 ptr S128 A128]))) + (cinsn 4 (set (reg:V4SI <2>) + (eq:V4SI (reg:V4SI <1>) + (mem:V4SI (reg:SI di) [0 ptr S128 A128])))) + (cinsn 5 (set (reg:V4SI <3>) (reg:V4SI <2>))) + (cinsn 6 (set (reg:V4SI xmm0) (reg:V4SI <3>))) + (edge-to exit (flags "FALLTHRU")) + ) + ) + (crtl (return_rtx (reg/i:V4SI xmm0))) +) +} + +v8si __RTL (startwith ("vregs1")) foo2 (void) +{ +(function "foo2" + (insn-chain + (block 2 + (edge-from entry (flags "FALLTHRU")) + (cnote 1 [bb 2] NOTE_INSN_BASIC_BLOCK) + (cnote 2 NOTE_INSN_FUNCTION_BEG) + (cinsn 3 (set (reg:V8SI <1>) + (mem:V8SI (reg:SI di) [0 ptr S256 A256]))) + (cinsn 4 (set (reg:V8SI <2>) + (eq:V8SI (mem:V8SI (reg:SI di) [0 ptr S256 A256]) + (reg:V8SI <1>)))) + (cinsn 5 (set (reg:V8SI <3>) (reg:V8SI <2>))) + (cinsn 6 (set (reg:V8SI xmm0) (reg:V8SI <3>))) + (edge-to exit (flags "FALLTHRU")) + ) + ) + (crtl (return_rtx (reg/i:V8SI xmm0))) +) +} + +v2di __RTL (startwith ("vregs1")) foo3 (void) +{ +(function "foo3" + (insn-chain + (block 2 + (edge-from entry (flags "FALLTHRU")) + (cnote 1 [bb 2] NOTE_INSN_BASIC_BLOCK) + (cnote 2 NOTE_INSN_FUNCTION_BEG) + (cinsn 3 (set (reg:V2DI <1>) + (mem:V2DI (reg:SI di) [0 ptr S128 A128]))) + (cinsn 4 (set (reg:V2DI <2>) + (eq:V2DI (reg:V2DI <1>) + (mem:V2DI (reg:SI di) [0 ptr S128 A128])))) + (cinsn 5 (set (reg:V2DI <3>) (reg:V2DI <2>))) + (cinsn 6 (set (reg:V2DI xmm0) (reg:V2DI <3>))) + (edge-to exit (flags "FALLTHRU")) + ) + ) + (crtl (return_rtx (reg/i:V2DI xmm0))) +) +} + +/* { dg-final { scan-assembler-times "vpcmpeq" 3 } } */ -- 2.49.0