Re: [PATCH][PR121599] aarch64: Fix ICE when op2 is zero for SVE2 saturating add intrinsics.

Kyrylo Tkachov Mon, 25 Aug 2025 06:40:19 -0700

Hi Jennifer,

> On 25 Aug 2025, at 12:56, Jennifer Schmitz <jschm...@nvidia.com> wrote:
> 
> When op2 in SVE2 saturating add intrinsics (svuqadd, svsqadd) is a zero
> vector and predication is _z, an ICE in vregs occurs, e.g. for
> 
> svuint8_t foo (svbool_t pg, svuint8_t op1)
> {
>    return svsqadd_u8_z (pg, op1, svdup_s8 (0));
> }
> 
> The insn failed to match the pattern (aarch64-sve2.md):
> 
> ;; Predicated binary operations with no reverse form, merging with zero.
> ;; At present we don't generate these patterns via a cond_* optab,
> ;; so there's no correctness requirement to handle merging with an
> ;; independent value.
> (define_insn_and_rewrite "*cond_<sve_int_op><mode>_z"
>  [(set (match_operand:SVE_FULL_I 0 "register_operand")
> (unspec:SVE_FULL_I
>  [(match_operand:<VPRED> 1 "register_operand")
>   (unspec:SVE_FULL_I
>     [(match_operand 5)
>      (unspec:SVE_FULL_I
> [(match_operand:SVE_FULL_I 2 "register_operand")
> (match_operand:SVE_FULL_I 3 "register_operand")]
> SVE2_COND_INT_BINARY_NOREV)]
>     UNSPEC_PRED_X)
>   (match_operand:SVE_FULL_I 4 "aarch64_simd_imm_zero")]
>  UNSPEC_SEL))]
>  "TARGET_SVE2"
>  {@ [ cons: =0 , 1   , 2 , 3  ]
>     [ &w       , Upl , 0 , w  ] movprfx\t%0.<Vetype>, %1/z, 
> %0.<Vetype>\;<sve_int_op>\t%0.<Vetype>, %1/m, %0.<Vetype>, %3.<Vetype>
>     [ &w       , Upl , w , w  ] movprfx\t%0.<Vetype>, %1/z, 
> %2.<Vetype>\;<sve_int_op>\t%0.<Vetype>, %1/m, %0.<Vetype>, %3.<Vetype>
>  }
>  "&& !CONSTANT_P (operands[5])"
>  {
>    operands[5] = CONSTM1_RTX (<VPRED>mode);
>  }
>  [(set_attr "movprfx" "yes")]
> )
> 
> because operands[3] and operands[4] were both expanded into the same register
> operand containing a zero vector by
> 
> ;; Predicated binary arithmetic with merging.
> (define_expand "@cond_<sve_int_op><mode>"
>  [(set (match_operand:SVE_FULL_I 0 "register_operand")
> (unspec:SVE_FULL_I
>  [(match_operand:<VPRED> 1 "register_operand")
>   (unspec:SVE_FULL_I
>     [(match_dup 5)
>      (unspec:SVE_FULL_I
> [(match_operand:SVE_FULL_I 2 "register_operand")
> (match_operand:SVE_FULL_I 3 "register_operand")]
> SVE2_COND_INT_BINARY)]
>     UNSPEC_PRED_X)
>   (match_operand:SVE_FULL_I 4 "aarch64_simd_reg_or_zero")]
>  UNSPEC_SEL))]
>  "TARGET_SVE2"
>  {
>    operands[5] = CONSTM1_RTX (<MODE>mode);
>  }
> )
> 
> This patch fixes the ICE by changing the predicate of operands[3]
> in both patterns to aarch64_simd_reg_or_zero. It also adds tests
> adjusted from the report in PR121599.
> 
> The patch was bootstrapped and tested on aarch64-linux-gnu, no regression.
> OK for trunk?
> 
> Alex Coplan pointed out in the bugzilla ticket that this ICE goes back
> to GCC 10. Shall we backport?
> 
> Signed-off-by: Jennifer Schmitz <jschm...@nvidia.com>
> 
> gcc/
> PR target/121599
> * config/aarch64/aarch64-sve2.md (@cond_<sve_int_op><mode>):
> Set predicate for operand 3 to aarch64_simd_reg_or_zero.
> (*cond_<sve_int_op><mode>_z): Likewise.
> 
> gcc/testsuite/
> PR target/121599
> * gcc.target/aarch64/sve2/pr121599.c: New test.
> ---
> gcc/config/aarch64/aarch64-sve2.md            |  4 +--
> .../gcc.target/aarch64/sve2/pr121599.c        | 31 +++++++++++++++++++
> 2 files changed, 33 insertions(+), 2 deletions(-)
> create mode 100644 gcc/testsuite/gcc.target/aarch64/sve2/pr121599.c
> 
> diff --git a/gcc/config/aarch64/aarch64-sve2.md 
> b/gcc/config/aarch64/aarch64-sve2.md
> index a3cbbce8b31..c6a84fa7f3d 100644
> --- a/gcc/config/aarch64/aarch64-sve2.md
> +++ b/gcc/config/aarch64/aarch64-sve2.md
> @@ -1021,7 +1021,7 @@
>     [(match_dup 5)
>      (unspec:SVE_FULL_I
> [(match_operand:SVE_FULL_I 2 "register_operand")
> - (match_operand:SVE_FULL_I 3 "register_operand")]
> + (match_operand:SVE_FULL_I 3 "aarch64_simd_reg_or_zero")]
> SVE2_COND_INT_BINARY)]
>     UNSPEC_PRED_X)
>   (match_operand:SVE_FULL_I 4 "aarch64_simd_reg_or_zero")]
> @@ -1136,7 +1136,7 @@
>     [(match_operand 5)
>      (unspec:SVE_FULL_I
> [(match_operand:SVE_FULL_I 2 "register_operand")
> - (match_operand:SVE_FULL_I 3 "register_operand")]
> + (match_operand:SVE_FULL_I 3 "aarch64_simd_reg_or_zero")]
> SVE2_COND_INT_BINARY_NOREV)]
>     UNSPEC_PRED_X)
>   (match_operand:SVE_FULL_I 4 "aarch64_simd_imm_zero")]


I think the ICE happens because operands[4] ends up being a register which the 
aarch64_simd_imm_zero can’t handle.
The pattern *cond_<sve_int_op><mode>_z is also not supposed to be handling 
register operands, it only expresses a merge with zero.
Allowing a zero in operands[3] here is problematic as the output code wouldn’t 
know how to print it if the midend used a zero vector for it.

I think the right solution is to adjust the @cond_<sve_int_op><mode> expander 
to only allow aarch64_simd_imm_zero for operands[4] for the 
SVE2_COND_INT_BINARY_NOREV subset of SVE2_COND_INT_BINARY.

Thanks,
Kyrill

> diff --git a/gcc/testsuite/gcc.target/aarch64/sve2/pr121599.c 
> b/gcc/testsuite/gcc.target/aarch64/sve2/pr121599.c
> new file mode 100644
> index 00000000000..cd80ef9115c
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/aarch64/sve2/pr121599.c
> @@ -0,0 +1,31 @@
> +/* PR target/121599.  */
> +/* { dg-do compile } */
> +/* { dg-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" "" } } */
> +
> +#include <arm_sve.h>
> +
> +/*
> +** foo:
> +** movi d([0-9]+), #0
> +** movprfx z0\.b, p0/z, z0\.b
> +** usqadd z0\.b, p0/m, z0\.b, z\1\.b
> +** ret
> +*/
> +svuint8_t foo (svbool_t pg, svuint8_t op1)
> +{
> +    return svsqadd_u8_z (pg, op1, svdup_s8 (0));
> +}
> +
> +/*
> +** bar:
> +** movi d([0-9]+), #0
> +** movprfx z0\.b, p0/z, z0\.b
> +** suqadd z0\.b, p0/m, z0\.b, z\1\.b
> +** ret
> +*/
> +svint8_t bar (svbool_t pg, svint8_t op1)
> +{
> +    return svuqadd_n_s8_z (pg, op1, 0);
> +}
> +
> -- 
> 2.34.1
>

Re: [PATCH][PR121599] aarch64: Fix ICE when op2 is zero for SVE2 saturating add intrinsics.

Reply via email to