Hi all, We've been asked to optimise the testcase in this patch of a 64-bit ADDP with the low and high halves of the same 128-bit vector. This can be done by a single .4s ADDP followed by just reading the bottom 64 bits. A splitter for this is quite straightforward now that all the vec_concat stuff is collapsed by simplify-rtx.
With this patch we generate a single: addp v0.4s, v0.4s, v0.4s instead of: dup d31, v0.d[1] addp v0.2s, v0.2s, v31.2s ret Bootstrapped and tested on aarch64-none-linux-gnu and aarch64_be-none-elf. Pushing to trunk. Thanks, Kyrill gcc/ChangeLog: * config/aarch64/aarch64-simd.md (*aarch64_addp_same_reg<mode>): New define_insn_and_split. gcc/testsuite/ChangeLog: * gcc.target/aarch64/simd/addp-same-low_1.c: New test.
addp-q.patch
Description: addp-q.patch