Hi all,

We've been asked to optimise the testcase in this patch of a 64-bit ADDP with
the low and high halves of the same 128-bit vector. This can be done by a
single .4s ADDP followed by just reading the bottom 64 bits. A splitter for
this is quite straightforward now that all the vec_concat stuff is collapsed
by simplify-rtx.

With this patch we generate a single:
        addp    v0.4s, v0.4s, v0.4s
instead of:
        dup     d31, v0.d[1]
        addp    v0.2s, v0.2s, v31.2s
        ret

Bootstrapped and tested on aarch64-none-linux-gnu and aarch64_be-none-elf.
Pushing to trunk.
Thanks,
Kyrill

gcc/ChangeLog:

        * config/aarch64/aarch64-simd.md (*aarch64_addp_same_reg<mode>):
        New define_insn_and_split.

gcc/testsuite/ChangeLog:

        * gcc.target/aarch64/simd/addp-same-low_1.c: New test.

Attachment: addp-q.patch
Description: addp-q.patch

Reply via email to