> On 25 Aug 2025, at 15:40, Kyrylo Tkachov <ktkac...@nvidia.com> wrote: > > Hi Jennifer, > >> On 25 Aug 2025, at 12:56, Jennifer Schmitz <jschm...@nvidia.com> wrote: >> >> When op2 in SVE2 saturating add intrinsics (svuqadd, svsqadd) is a zero >> vector and predication is _z, an ICE in vregs occurs, e.g. for >> >> svuint8_t foo (svbool_t pg, svuint8_t op1) >> { >> return svsqadd_u8_z (pg, op1, svdup_s8 (0)); >> } >> >> The insn failed to match the pattern (aarch64-sve2.md): >> >> ;; Predicated binary operations with no reverse form, merging with zero. >> ;; At present we don't generate these patterns via a cond_* optab, >> ;; so there's no correctness requirement to handle merging with an >> ;; independent value. >> (define_insn_and_rewrite "*cond_<sve_int_op><mode>_z" >> [(set (match_operand:SVE_FULL_I 0 "register_operand") >> (unspec:SVE_FULL_I >> [(match_operand:<VPRED> 1 "register_operand") >> (unspec:SVE_FULL_I >> [(match_operand 5) >> (unspec:SVE_FULL_I >> [(match_operand:SVE_FULL_I 2 "register_operand") >> (match_operand:SVE_FULL_I 3 "register_operand")] >> SVE2_COND_INT_BINARY_NOREV)] >> UNSPEC_PRED_X) >> (match_operand:SVE_FULL_I 4 "aarch64_simd_imm_zero")] >> UNSPEC_SEL))] >> "TARGET_SVE2" >> {@ [ cons: =0 , 1 , 2 , 3 ] >> [ &w , Upl , 0 , w ] movprfx\t%0.<Vetype>, %1/z, >> %0.<Vetype>\;<sve_int_op>\t%0.<Vetype>, %1/m, %0.<Vetype>, %3.<Vetype> >> [ &w , Upl , w , w ] movprfx\t%0.<Vetype>, %1/z, >> %2.<Vetype>\;<sve_int_op>\t%0.<Vetype>, %1/m, %0.<Vetype>, %3.<Vetype> >> } >> "&& !CONSTANT_P (operands[5])" >> { >> operands[5] = CONSTM1_RTX (<VPRED>mode); >> } >> [(set_attr "movprfx" "yes")] >> ) >> >> because operands[3] and operands[4] were both expanded into the same register >> operand containing a zero vector by >> >> ;; Predicated binary arithmetic with merging. >> (define_expand "@cond_<sve_int_op><mode>" >> [(set (match_operand:SVE_FULL_I 0 "register_operand") >> (unspec:SVE_FULL_I >> [(match_operand:<VPRED> 1 "register_operand") >> (unspec:SVE_FULL_I >> [(match_dup 5) >> (unspec:SVE_FULL_I >> [(match_operand:SVE_FULL_I 2 "register_operand") >> (match_operand:SVE_FULL_I 3 "register_operand")] >> SVE2_COND_INT_BINARY)] >> UNSPEC_PRED_X) >> (match_operand:SVE_FULL_I 4 "aarch64_simd_reg_or_zero")] >> UNSPEC_SEL))] >> "TARGET_SVE2" >> { >> operands[5] = CONSTM1_RTX (<MODE>mode); >> } >> ) >> >> This patch fixes the ICE by changing the predicate of operands[3] >> in both patterns to aarch64_simd_reg_or_zero. It also adds tests >> adjusted from the report in PR121599. >> >> The patch was bootstrapped and tested on aarch64-linux-gnu, no regression. >> OK for trunk? >> >> Alex Coplan pointed out in the bugzilla ticket that this ICE goes back >> to GCC 10. Shall we backport? >> >> Signed-off-by: Jennifer Schmitz <jschm...@nvidia.com> >> >> gcc/ >> PR target/121599 >> * config/aarch64/aarch64-sve2.md (@cond_<sve_int_op><mode>): >> Set predicate for operand 3 to aarch64_simd_reg_or_zero. >> (*cond_<sve_int_op><mode>_z): Likewise. >> >> gcc/testsuite/ >> PR target/121599 >> * gcc.target/aarch64/sve2/pr121599.c: New test. >> --- >> gcc/config/aarch64/aarch64-sve2.md | 4 +-- >> .../gcc.target/aarch64/sve2/pr121599.c | 31 +++++++++++++++++++ >> 2 files changed, 33 insertions(+), 2 deletions(-) >> create mode 100644 gcc/testsuite/gcc.target/aarch64/sve2/pr121599.c >> >> diff --git a/gcc/config/aarch64/aarch64-sve2.md >> b/gcc/config/aarch64/aarch64-sve2.md >> index a3cbbce8b31..c6a84fa7f3d 100644 >> --- a/gcc/config/aarch64/aarch64-sve2.md >> +++ b/gcc/config/aarch64/aarch64-sve2.md >> @@ -1021,7 +1021,7 @@ >> [(match_dup 5) >> (unspec:SVE_FULL_I >> [(match_operand:SVE_FULL_I 2 "register_operand") >> - (match_operand:SVE_FULL_I 3 "register_operand")] >> + (match_operand:SVE_FULL_I 3 "aarch64_simd_reg_or_zero")] >> SVE2_COND_INT_BINARY)] >> UNSPEC_PRED_X) >> (match_operand:SVE_FULL_I 4 "aarch64_simd_reg_or_zero")] >> @@ -1136,7 +1136,7 @@ >> [(match_operand 5) >> (unspec:SVE_FULL_I >> [(match_operand:SVE_FULL_I 2 "register_operand") >> - (match_operand:SVE_FULL_I 3 "register_operand")] >> + (match_operand:SVE_FULL_I 3 "aarch64_simd_reg_or_zero")] >> SVE2_COND_INT_BINARY_NOREV)] >> UNSPEC_PRED_X) >> (match_operand:SVE_FULL_I 4 "aarch64_simd_imm_zero")] > > I think the ICE happens because operands[4] ends up being a register which > the aarch64_simd_imm_zero can’t handle. > The pattern *cond_<sve_int_op><mode>_z is also not supposed to be handling > register operands, it only expresses a merge with zero. > Allowing a zero in operands[3] here is problematic as the output code > wouldn’t know how to print it if the midend used a zero vector for it. > > I think the right solution is to adjust the @cond_<sve_int_op><mode> expander > to only allow aarch64_simd_imm_zero for operands[4] for the > SVE2_COND_INT_BINARY_NOREV subset of SVE2_COND_INT_BINARY. Hi Kyrill, Thanks for reviewing the patch. I tried allowing only aarch64_simd_imm_zero for operands[4] for SVE2_COND_INT_BINARY_NOREV, but while that solved the ICE it caused all tests predication other than _z to fail. I chose instead to add a pattern similar to the existing *cond_<sve_int_op><mode>_3 for SVE2_COND_INT_BINARY_NOREV that handles the case when operands[3] and operands[4] are the same register operand. Thanks, Jennifer
When op2 in SVE2 saturating add intrinsics (svuqadd, svsqadd) is a zero vector and predication is _z, an ICE in vregs occurs, e.g. for svuint8_t foo (svbool_t pg, svuint8_t op1) { return svsqadd_u8_z (pg, op1, svdup_s8 (0)); } The insn failed to match the pattern (aarch64-sve2.md): ;; Predicated binary operations with no reverse form, merging with zero. ;; At present we don't generate these patterns via a cond_* optab, ;; so there's no correctness requirement to handle merging with an ;; independent value. (define_insn_and_rewrite "*cond_<sve_int_op><mode>_z" [(set (match_operand:SVE_FULL_I 0 "register_operand") (unspec:SVE_FULL_I [(match_operand:<VPRED> 1 "register_operand") (unspec:SVE_FULL_I [(match_operand 5) (unspec:SVE_FULL_I [(match_operand:SVE_FULL_I 2 "register_operand") (match_operand:SVE_FULL_I 3 "register_operand")] SVE2_COND_INT_BINARY_NOREV)] UNSPEC_PRED_X) (match_operand:SVE_FULL_I 4 "aarch64_simd_imm_zero")] UNSPEC_SEL))] "TARGET_SVE2" {@ [ cons: =0 , 1 , 2 , 3 ] [ &w , Upl , 0 , w ] movprfx\t%0.<Vetype>, %1/z, %0.<Vetype>\;<sve_int_op>\t%0.<Vetype>, %1/m, %0.<Vetype>, %3.<Vetype> [ &w , Upl , w , w ] movprfx\t%0.<Vetype>, %1/z, %2.<Vetype>\;<sve_int_op>\t%0.<Vetype>, %1/m, %0.<Vetype>, %3.<Vetype> } "&& !CONSTANT_P (operands[5])" { operands[5] = CONSTM1_RTX (<VPRED>mode); } [(set_attr "movprfx" "yes")] ) because operands[3] and operands[4] were both expanded into the same register operand containing a zero vector by define_expand "@cond_<sve_int_op><mode>". This patch fixes the ICE by adding the pattern ;; Predicated binary operations with no reverse form, merging with ;; the second input. (define_insn_and_rewrite "*cond_<sve_int_op><mode>_3" for SVE2_COND_INT_BINARY_NOREV that allows operands[4] to be a register operand (equal to operands[3]). It adds a vcond_mask expression such that we end up with an insn matched by ;; Predicated binary arithmetic, merging with the first input. (define_insn_and_rewrite "*cond_<sve_int_op><mode>_2" The patch was bootstrapped and tested on aarch64-linux-gnu, no regression. OK for trunk? Alex Coplan pointed out in the bugzilla ticket that this ICE goes back to GCC 10. Shall we backport? Signed-off-by: Jennifer Schmitz <jschm...@nvidia.com> gcc/ PR target/121599 * config/aarch64/aarch64-sve2.md (*cond_<sve_int_op><mode>_3): New pattern for SVE2_COND_INT_BINARY_NOREV. gcc/testsuite/ PR target/121599 * gcc.target/aarch64/sve2/pr121599.c: New test. --- gcc/config/aarch64/aarch64-sve2.md | 36 +++++++++++++++++++ .../gcc.target/aarch64/sve2/pr121599.c | 31 ++++++++++++++++ 2 files changed, 67 insertions(+) create mode 100644 gcc/testsuite/gcc.target/aarch64/sve2/pr121599.c diff --git a/gcc/config/aarch64/aarch64-sve2.md b/gcc/config/aarch64/aarch64-sve2.md index a3cbbce8b31..d82dc41df3e 100644 --- a/gcc/config/aarch64/aarch64-sve2.md +++ b/gcc/config/aarch64/aarch64-sve2.md @@ -1124,6 +1124,42 @@ [(set_attr "movprfx" "yes")] ) +;; Predicated binary operations with no reverse form, merging with +;; the second input. +(define_insn_and_rewrite "*cond_<sve_int_op><mode>_3" + [(set (match_operand:SVE_FULL_I 0 "register_operand") + (unspec:SVE_FULL_I + [(match_operand:<VPRED> 1 "register_operand") + (unspec:SVE_FULL_I + [(match_operand 5) + (unspec:SVE_FULL_I + [(match_operand:SVE_FULL_I 2 "register_operand") + (match_operand:SVE_FULL_I 3 "register_operand")] + SVE2_COND_INT_BINARY_NOREV)] + UNSPEC_PRED_X) + (match_operand:SVE_FULL_I 4 "register_operand")] + UNSPEC_SEL))] + "TARGET_SVE2 + && rtx_equal_p (operands[3], operands[4])" + "#" + "&& 1" + { + if (reload_completed + && register_operand (operands[4], <MODE>mode) + && !rtx_equal_p (operands[0], operands[4])) + { + emit_insn (gen_vcond_mask_<mode><vpred> (operands[0], operands[2], + operands[4], operands[1])); + operands[4] = operands[2] = operands[0]; + } + else if (!CONSTANT_P (operands[5])) + operands[5] = CONSTM1_RTX (<VPRED>mode); + else + FAIL; + } + [(set_attr "movprfx" "yes")] +) + ;; Predicated binary operations with no reverse form, merging with zero. ;; At present we don't generate these patterns via a cond_* optab, ;; so there's no correctness requirement to handle merging with an diff --git a/gcc/testsuite/gcc.target/aarch64/sve2/pr121599.c b/gcc/testsuite/gcc.target/aarch64/sve2/pr121599.c new file mode 100644 index 00000000000..dbfa613649d --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/sve2/pr121599.c @@ -0,0 +1,31 @@ +/* PR target/121599. */ +/* { dg-do compile } */ +/* { dg-options "-O2" } */ +/* { dg-final { check-function-bodies "**" "" "" } } */ + +#include <arm_sve.h> + +/* +** foo: +** movi d([0-9]+), #0 +** sel z0\.b, p0, z0\.b, z\1\.b +** usqadd z0\.b, p0/m, z0\.b, z\1\.b +** ret +*/ +svuint8_t foo (svbool_t pg, svuint8_t op1) +{ + return svsqadd_u8_z (pg, op1, svdup_s8 (0)); +} + +/* +** bar: +** movi d([0-9]+), #0 +** sel z0\.b, p0, z0\.b, z\1\.b +** suqadd z0\.b, p0/m, z0\.b, z\1\.b +** ret +*/ +svint8_t bar (svbool_t pg, svint8_t op1) +{ + return svuqadd_n_s8_z (pg, op1, 0); +} + -- 2.34.1 > > Thanks, > Kyrill > >> diff --git a/gcc/testsuite/gcc.target/aarch64/sve2/pr121599.c >> b/gcc/testsuite/gcc.target/aarch64/sve2/pr121599.c >> new file mode 100644 >> index 00000000000..cd80ef9115c >> --- /dev/null >> +++ b/gcc/testsuite/gcc.target/aarch64/sve2/pr121599.c >> @@ -0,0 +1,31 @@ >> +/* PR target/121599. */ >> +/* { dg-do compile } */ >> +/* { dg-options "-O2" } */ >> +/* { dg-final { check-function-bodies "**" "" "" } } */ >> + >> +#include <arm_sve.h> >> + >> +/* >> +** foo: >> +** movi d([0-9]+), #0 >> +** movprfx z0\.b, p0/z, z0\.b >> +** usqadd z0\.b, p0/m, z0\.b, z\1\.b >> +** ret >> +*/ >> +svuint8_t foo (svbool_t pg, svuint8_t op1) >> +{ >> + return svsqadd_u8_z (pg, op1, svdup_s8 (0)); >> +} >> + >> +/* >> +** bar: >> +** movi d([0-9]+), #0 >> +** movprfx z0\.b, p0/z, z0\.b >> +** suqadd z0\.b, p0/m, z0\.b, z\1\.b >> +** ret >> +*/ >> +svint8_t bar (svbool_t pg, svint8_t op1) >> +{ >> + return svuqadd_n_s8_z (pg, op1, 0); >> +} >> + >> -- >> 2.34.1 >> >