I'd like to backport r15-1244 to GCC 14 and then apply the patch below. ===
This patch includes the testcase from r15-1399 plus a miminal fix for it, without the other proactive uses of force_subreg. We can backport other force_subreg calls later if they're shown to be needed. Boostrapped & regression-tested on aarch64-linux-gnu. I'll leave a bit of time for comments and then push on Monday if no-one has any objections before then. Richard gcc/ PR target/115464 * config/aarch64/aarch64-sve-builtins-base.cc (svset_neonq_impl::expand): Use force_subreg instead of lowpart_subreg. gcc/testsuite/ PR target/115464 * gcc.target/aarch64/sve/acle/general/pr115464_2.c: New test. --- gcc/config/aarch64/aarch64-sve-builtins-base.cc | 4 +++- .../gcc.target/aarch64/sve/acle/general/pr115464_2.c | 11 +++++++++++ 2 files changed, 14 insertions(+), 1 deletion(-) create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/acle/general/pr115464_2.c diff --git a/gcc/config/aarch64/aarch64-sve-builtins-base.cc b/gcc/config/aarch64/aarch64-sve-builtins-base.cc index c9182594bc1..241a249503f 100644 --- a/gcc/config/aarch64/aarch64-sve-builtins-base.cc +++ b/gcc/config/aarch64/aarch64-sve-builtins-base.cc @@ -1185,7 +1185,9 @@ public: if (BYTES_BIG_ENDIAN) return e.use_exact_insn (code_for_aarch64_sve_set_neonq (mode)); insn_code icode = code_for_vcond_mask (mode, mode); - e.args[1] = lowpart_subreg (mode, e.args[1], GET_MODE (e.args[1])); + e.args[1] = force_subreg (mode, e.args[1], GET_MODE (e.args[1]), + subreg_lowpart_offset (mode, + GET_MODE (e.args[1]))); e.add_output_operand (icode); e.add_input_operand (icode, e.args[1]); e.add_input_operand (icode, e.args[0]); diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/general/pr115464_2.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/general/pr115464_2.c new file mode 100644 index 00000000000..f561c34f732 --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/general/pr115464_2.c @@ -0,0 +1,11 @@ +/* { dg-options "-O2" } */ + +#include <arm_neon.h> +#include <arm_sve.h> +#include <arm_neon_sve_bridge.h> + +svuint16_t +convolve4_4_x (uint16x8x2_t permute_tbl, svuint16_t a) +{ + return svset_neonq_u16 (a, permute_tbl.val[1]); +} -- 2.25.1