https://gcc.gnu.org/bugzilla/show_bug.cgi?id=117978
--- Comment #7 from GCC Commits <cvs-commit at gcc dot gnu.org> --- The master branch has been updated by Jennifer Schmitz <jschm...@gcc.gnu.org>: https://gcc.gnu.org/g:210d06502f22964c7214586c54f8eb54a6965bfd commit r16-446-g210d06502f22964c7214586c54f8eb54a6965bfd Author: Jennifer Schmitz <jschm...@nvidia.com> Date: Fri Feb 14 00:46:13 2025 -0800 AArch64: Fold SVE load/store with certain ptrue patterns to LDR/STR. SVE loads/stores using predicates that select the bottom 8, 16, 32, 64, or 128 bits of a register can be folded to ASIMD LDR/STR, thus avoiding the predicate. For example, svuint8_t foo (uint8_t *x) { return svld1 (svwhilelt_b8 (0, 16), x); } was previously compiled to: foo: ptrue p3.b, vl16 ld1b z0.b, p3/z, [x0] ret and is now compiled to: foo: ldr q0, [x0] ret The optimization is applied during the expand pass and was implemented by making the following changes to maskload<mode><vpred> and maskstore<mode><vpred>: - the existing define_insns were renamed and new define_expands for maskloads and maskstores were added with nonmemory_operand as predicate such that the SVE predicate matches both register operands and constant-vector operands. - if the SVE predicate is a constant vector and contains a pattern as described above, an ASIMD load/store is emitted instead of the SVE load/store. The patch implements the optimization for LD1 and ST1, for 8-bit, 16-bit, 32-bit, 64-bit, and 128-bit moves, for all full SVE data vector modes. Follow-up patches for LD2/3/4 and ST2/3/4 and potentially partial SVE vector modes are planned. The patch was bootstrapped and tested on aarch64-linux-gnu, no regression. Signed-off-by: Jennifer Schmitz <jschm...@nvidia.com> gcc/ PR target/117978 * config/aarch64/aarch64-protos.h: Declare aarch64_emit_load_store_through_mode and aarch64_sve_maskloadstore. * config/aarch64/aarch64-sve.md (maskload<mode><vpred>): New define_expand folding maskloads with certain predicate patterns to ASIMD loads. (*aarch64_maskload<mode><vpred>): Renamed from maskload<mode><vpred>. (maskstore<mode><vpred>): New define_expand folding maskstores with certain predicate patterns to ASIMD stores. (*aarch64_maskstore<mode><vpred>): Renamed from maskstore<mode><vpred>. * config/aarch64/aarch64.cc (aarch64_emit_load_store_through_mode): New function emitting a load/store through subregs of a given mode. (aarch64_emit_sve_pred_move): Refactor to use aarch64_emit_load_store_through_mode. (aarch64_expand_maskloadstore): New function to emit ASIMD loads/stores for maskloads/stores with SVE predicates with VL1, VL2, VL4, VL8, or VL16 patterns. (aarch64_partial_ptrue_length): New function returning number of leading set bits in a predicate. gcc/testsuite/ PR target/117978 * gcc.target/aarch64/sve/acle/general/whilelt_5.c: Adjust expected outcome. * gcc.target/aarch64/sve/ldst_ptrue_pat_128_to_neon.c: New test. * gcc.target/aarch64/sve/while_7.c: Adjust expected outcome. * gcc.target/aarch64/sve/while_9.c: Adjust expected outcome.