RE: [PATCH][PR121602] aarch64: Force vector when folding svmul with all-ones op1.

Tamar Christina Mon, 01 Sep 2025 06:47:15 -0700

> -----Original Message-----
> From: Jennifer Schmitz <jschm...@nvidia.com>
> Sent: Friday, August 29, 2025 1:17 PM
> To: GCC Patches <gcc-patches@gcc.gnu.org>
> Cc: Alex Coplan <alex.cop...@arm.com>; Kyrylo Tkachov
> <ktkac...@nvidia.com>; Tamar Christina <tamar.christ...@arm.com>; Richard
> Earnshaw <richard.earns...@arm.com>; Andrew Pinski <pins...@gmail.com>
> Subject: [PATCH][PR121602] aarch64: Force vector when folding svmul with all-
> ones op1.
> 
> An ICE was reported in the following test case:
> svint8_t foo(svbool_t pg, int8_t op2) {
>       return svmul_n_s8_z(pg, svdup_s8(1), op2);
> }
> with a type mismatch in 'vec_cond_expr':
> _4 = VEC_COND_EXPR <v16_2(D), v32_3(D), { 0, ... }>;
> 
> The reason is that svmul_impl::fold folds calls where one of the operands
> is all ones to the other operand using
> gimple_folder::fold_active_lanes_to. However, we implicitly assume
> that the argument that is passed to fold_active_lanes_to is a vector
> type. In the given test case op2 is a scalar type, resulting in the type
> mismatch in the vec_cond_expr.
> 
> This patch fixes the ICE by forcing a vector type before svmul_impl::fold
> calls fold_active_lanes_to.
> 
> A more general option would be to move force_vector to fold_active_lanes_to.
>


I was wondering how the constant version doesn't need the fixup, e.g.

#include <arm_sve.h>

svint8_t foo(svbool_t pg, int8_t op2) {
      return svmul_n_s8_x(pg, svdup_s8(1), 3);
}

And it seems this is due to fold_const_binary doing the fixup from scalar to
vector in vector_const_binop.

This to me seems like indeed we should fix it up in fold_active_lanes_to be 
consistent.

> The patch was bootstrapped and tested on aarch64-linux-gnu, no regression.
> OK for trunk?
> OK to backport to GCC 15?

OK with moving it to fold_active_lanes_to unless someone disagrees.

Thanks,
Tamar

> 
> Signed-off-by: Jennifer Schmitz <jschm...@nvidia.com>
> 
> gcc/
>       PR target/121602
>       * config/aarch64/aarch64-sve-builtins-base.cc (svmul_impl::fold):
>       Force vector type before calling fold_active_lanes_to.
> 
> gcc/testsuite/
>       PR target/121602
>       * gcc.target/aarch64/sve/acle/asm/mul_s16.c: New test.
>       * gcc.target/aarch64/sve/acle/asm/mul_s32.c: Likewise.
>       * gcc.target/aarch64/sve/acle/asm/mul_s64.c: Likewise.
>       * gcc.target/aarch64/sve/acle/asm/mul_s8.c: Likewise.
>       * gcc.target/aarch64/sve/acle/asm/mul_u16.c: Likewise.
>       * gcc.target/aarch64/sve/acle/asm/mul_u32.c: Likewise.
>       * gcc.target/aarch64/sve/acle/asm/mul_u64.c: Likewise.
>       * gcc.target/aarch64/sve/acle/asm/mul_u8.c: Likewise.
> ---
>  gcc/config/aarch64/aarch64-sve-builtins-base.cc        |  7 ++++++-
>  .../gcc.target/aarch64/sve/acle/asm/mul_s16.c          | 10 ++++++++++
>  .../gcc.target/aarch64/sve/acle/asm/mul_s32.c          | 10 ++++++++++
>  .../gcc.target/aarch64/sve/acle/asm/mul_s64.c          | 10 ++++++++++
>  gcc/testsuite/gcc.target/aarch64/sve/acle/asm/mul_s8.c | 10 ++++++++++
>  .../gcc.target/aarch64/sve/acle/asm/mul_u16.c          | 10 ++++++++++
>  .../gcc.target/aarch64/sve/acle/asm/mul_u32.c          | 10 ++++++++++
>  .../gcc.target/aarch64/sve/acle/asm/mul_u64.c          | 10 ++++++++++
>  gcc/testsuite/gcc.target/aarch64/sve/acle/asm/mul_u8.c | 10 ++++++++++
>  9 files changed, 86 insertions(+), 1 deletion(-)
> 
> diff --git a/gcc/config/aarch64/aarch64-sve-builtins-base.cc
> b/gcc/config/aarch64/aarch64-sve-builtins-base.cc
> index ecc06877cac..aaa7be0d4d1 100644
> --- a/gcc/config/aarch64/aarch64-sve-builtins-base.cc
> +++ b/gcc/config/aarch64/aarch64-sve-builtins-base.cc
> @@ -2314,7 +2314,12 @@ public:
>      tree op1 = gimple_call_arg (f.call, 1);
>      tree op2 = gimple_call_arg (f.call, 2);
>      if (integer_onep (op1))
> -      return f.fold_active_lanes_to (op2);
> +      {
> +     gimple_seq stmts = NULL;
> +     op2 = f.force_vector (stmts, TREE_TYPE (f.lhs), op2);
> +     gsi_insert_seq_before (f.gsi, stmts, GSI_SAME_STMT);
> +     return f.fold_active_lanes_to (op2);
> +      }
>      if (integer_onep (op2))
>        return f.fold_active_lanes_to (op1);
> 
> diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/mul_s16.c
> b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/mul_s16.c
> index e9b6bf83b03..4148097cc63 100644
> --- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/mul_s16.c
> +++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/mul_s16.c
> @@ -331,6 +331,16 @@ TEST_UNIFORM_Z (mul_1op1_s16_z_tied2, svint16_t,
>               z0 = svmul_s16_z (p0, svdup_s16 (1), z0),
>               z0 = svmul_z (p0, svdup_s16 (1), z0))
> 
> +/*
> +** mul_1op1n_s16_z:
> +**   movprfx z0\.h, p0/z, z0\.h
> +**   mov     z0\.h, p0/m, w0
> +**   ret
> +*/
> +TEST_UNIFORM_ZX (mul_1op1n_s16_z, svint16_t, int16_t,
> +     z0 = svmul_n_s16_z (p0, svdup_s16 (1), x0),
> +     z0 = svmul_z (p0, svdup_s16 (1), x0))
> +
>  /*
>  ** mul_3_s16_z_tied1:
>  **   mov     (z[0-9]+\.h), #3
> diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/mul_s32.c
> b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/mul_s32.c
> index 71c476f48ca..2c53e3f14d6 100644
> --- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/mul_s32.c
> +++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/mul_s32.c
> @@ -341,6 +341,16 @@ TEST_UNIFORM_Z (mul_1op1_s32_z_tied2, svint32_t,
>               z0 = svmul_s32_z (p0, svdup_s32 (1), z0),
>               z0 = svmul_z (p0, svdup_s32 (1), z0))
> 
> +/*
> +** mul_1op1n_s32_z:
> +**   movprfx z0\.s, p0/z, z0\.s
> +**   mov     z0\.s, p0/m, w0
> +**   ret
> +*/
> +TEST_UNIFORM_ZX (mul_1op1n_s32_z, svint32_t, int32_t,
> +     z0 = svmul_n_s32_z (p0, svdup_s32 (1), x0),
> +     z0 = svmul_z (p0, svdup_s32 (1), x0))
> +
>  /*
>  ** mul_3_s32_z_tied1:
>  **   mov     (z[0-9]+\.s), #3
> diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/mul_s64.c
> b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/mul_s64.c
> index a34dc27740a..55342a13f8b 100644
> --- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/mul_s64.c
> +++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/mul_s64.c
> @@ -340,6 +340,16 @@ TEST_UNIFORM_Z (mul_1op1_s64_z_tied2, svint64_t,
>               z0 = svmul_s64_z (p0, svdup_s64 (1), z0),
>               z0 = svmul_z (p0, svdup_s64 (1), z0))
> 
> +/*
> +** mul_1op1n_s64_z:
> +**   movprfx z0\.d, p0/z, z0\.d
> +**   mov     z0\.d, p0/m, x0
> +**   ret
> +*/
> +TEST_UNIFORM_ZX (mul_1op1n_s64_z, svint64_t, int64_t,
> +     z0 = svmul_n_s64_z (p0, svdup_s64 (1), x0),
> +     z0 = svmul_z (p0, svdup_s64 (1), x0))
> +
>  /*
>  ** mul_2_s64_z_tied1:
>  **   movprfx z0.d, p0/z, z0.d
> diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/mul_s8.c
> b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/mul_s8.c
> index 683e15eccec..786a424eeea 100644
> --- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/mul_s8.c
> +++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/mul_s8.c
> @@ -331,6 +331,16 @@ TEST_UNIFORM_Z (mul_1op1_s8_z_tied2, svint8_t,
>               z0 = svmul_s8_z (p0, svdup_s8 (1), z0),
>               z0 = svmul_z (p0, svdup_s8 (1), z0))
> 
> +/*
> +** mul_1op1n_s8_z:
> +**   movprfx z0\.b, p0/z, z0\.b
> +**   mov     z0\.b, p0/m, w0
> +**   ret
> +*/
> +TEST_UNIFORM_ZX (mul_1op1n_s8_z, svint8_t, int8_t,
> +     z0 = svmul_n_s8_z (p0, svdup_s8 (1), x0),
> +     z0 = svmul_z (p0, svdup_s8 (1), x0))
> +
>  /*
>  ** mul_3_s8_z_tied1:
>  **   mov     (z[0-9]+\.b), #3
> diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/mul_u16.c
> b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/mul_u16.c
> index e228dc5995d..ed08635382d 100644
> --- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/mul_u16.c
> +++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/mul_u16.c
> @@ -312,6 +312,16 @@ TEST_UNIFORM_Z (mul_1op1_u16_z_tied2, svuint16_t,
>               z0 = svmul_u16_z (p0, svdup_u16 (1), z0),
>               z0 = svmul_z (p0, svdup_u16 (1), z0))
> 
> +/*
> +** mul_1op1n_u16_z:
> +**   movprfx z0\.h, p0/z, z0\.h
> +**   mov     z0\.h, p0/m, w0
> +**   ret
> +*/
> +TEST_UNIFORM_ZX (mul_1op1n_u16_z, svuint16_t, uint16_t,
> +     z0 = svmul_n_u16_z (p0, svdup_u16 (1), x0),
> +     z0 = svmul_z (p0, svdup_u16 (1), x0))
> +
>  /*
>  ** mul_3_u16_z_tied1:
>  **   mov     (z[0-9]+\.h), #3
> diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/mul_u32.c
> b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/mul_u32.c
> index e8f52c9d785..f82ac4269e8 100644
> --- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/mul_u32.c
> +++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/mul_u32.c
> @@ -312,6 +312,16 @@ TEST_UNIFORM_Z (mul_1op1_u32_z_tied2, svuint32_t,
>               z0 = svmul_u32_z (p0, svdup_u32 (1), z0),
>               z0 = svmul_z (p0, svdup_u32 (1), z0))
> 
> +/*
> +** mul_1op1n_u32_z:
> +**   movprfx z0\.s, p0/z, z0\.s
> +**   mov     z0\.s, p0/m, w0
> +**   ret
> +*/
> +TEST_UNIFORM_ZX (mul_1op1n_u32_z, svuint32_t, uint32_t,
> +     z0 = svmul_n_u32_z (p0, svdup_u32 (1), x0),
> +     z0 = svmul_z (p0, svdup_u32 (1), x0))
> +
>  /*
>  ** mul_3_u32_z_tied1:
>  **   mov     (z[0-9]+\.s), #3
> diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/mul_u64.c
> b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/mul_u64.c
> index 2ccdc3642c5..9f1bfff5fd2 100644
> --- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/mul_u64.c
> +++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/mul_u64.c
> @@ -333,6 +333,16 @@ TEST_UNIFORM_Z (mul_1op1_u64_z_tied2, svuint64_t,
>               z0 = svmul_u64_z (p0, svdup_u64 (1), z0),
>               z0 = svmul_z (p0, svdup_u64 (1), z0))
> 
> +/*
> +** mul_1op1n_u64_z:
> +**   movprfx z0\.d, p0/z, z0\.d
> +**   mov     z0\.d, p0/m, x0
> +**   ret
> +*/
> +TEST_UNIFORM_ZX (mul_1op1n_u64_z, svuint64_t, uint64_t,
> +     z0 = svmul_n_u64_z (p0, svdup_u64 (1), x0),
> +     z0 = svmul_z (p0, svdup_u64 (1), x0))
> +
>  /*
>  ** mul_2_u64_z_tied1:
>  **   movprfx z0.d, p0/z, z0.d
> diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/mul_u8.c
> b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/mul_u8.c
> index 8e53a4821f0..b2c1edf5ff8 100644
> --- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/mul_u8.c
> +++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/mul_u8.c
> @@ -312,6 +312,16 @@ TEST_UNIFORM_Z (mul_1op1_u8_z_tied2, svuint8_t,
>               z0 = svmul_u8_z (p0, svdup_u8 (1), z0),
>               z0 = svmul_z (p0, svdup_u8 (1), z0))
> 
> +/*
> +** mul_1op1n_u8_z:
> +**   movprfx z0\.b, p0/z, z0\.b
> +**   mov     z0\.b, p0/m, w0
> +**   ret
> +*/
> +TEST_UNIFORM_ZX (mul_1op1n_u8_z, svuint8_t, uint8_t,
> +     z0 = svmul_n_u8_z (p0, svdup_u8 (1), x0),
> +     z0 = svmul_z (p0, svdup_u8 (1), x0))
> +
>  /*
>  ** mul_3_u8_z_tied1:
>  **   mov     (z[0-9]+\.b), #3
> --
> 2.34.1

RE: [PATCH][PR121602] aarch64: Force vector when folding svmul with all-ones op1.

Reply via email to