Hi all, We can implement the halving-narrowing add/sub patterns with standard RTL codes as well rather than relying on unspecs. This patch handles the low-part ones and the second patch does the high-part ones and removes the unspecs themselves. The operation ADDHN on V4SI, for example, is represented as (truncate:V4HI ((src1:V4SI + src2:V4SI) >> 16)) and RADDHN as (truncate:V4HI ((src1:V4SI + src2:V4SI + (1 << 15)) >> 16)). Taking this opportunity I specified the patterns returning the narrow mode and annotated them with the <vczle><vczbe> define_subst rules to get the vec_concat-zero meta-patterns too. This allows us to simplify the expanders somewhat too. Tests are added to check that the combinations work.
Bootstrapped and tested on aarch64-none-linux-gnu. Also tested on aarch64_be-none-elf. Pushing to trunk. Thanks, Kyrill gcc/ChangeLog: * config/aarch64/aarch64-simd.md (aarch64_<sur><addsub>hn<mode>_insn_le): Delete. (aarch64_<optab>hn<mode>_insn<vczle><vczbe>): New define_insn. (aarch64_<sur><addsub>hn<mode>_insn_be): Delete. (aarch64_r<optab>hn<mode>_insn<vczle><vczbe>): New define_insn. (aarch64_<sur><addsub>hn<mode>): Delete. (aarch64_<optab>hn<mode>): New define_expand. (aarch64_r<optab>hn<mode>): Likewise. * config/aarch64/predicates.md (aarch64_simd_raddsubhn_imm_vec): New predicate. gcc/testsuite/ChangeLog: * gcc.target/aarch64/simd/pr99195_4.c: New test.
addhn.patch
Description: addhn.patch