Hi all, This patch converts the patterns for the integer widen and pairwise-add instructions to standard RTL operations. The pairwise addition withing a vector can be represented as an addition of two vec_selects, one selecting the even elements, and one selecting odd. Thus for the intrinsic vpaddlq_s8 we can generate: (set (reg:V8HI 92) (plus:V8HI (vec_select:V8HI (sign_extend:V16HI (reg/v:V16QI 93 [ a ])) (parallel [ (const_int 0 [0]) (const_int 2 [0x2]) (const_int 4 [0x4]) (const_int 6 [0x6]) (const_int 8 [0x8]) (const_int 10 [0xa]) (const_int 12 [0xc]) (const_int 14 [0xe]) ])) (vec_select:V8HI (sign_extend:V16HI (reg/v:V16QI 93 [ a ])) (parallel [ (const_int 1 [0x1]) (const_int 3 [0x3]) (const_int 5 [0x5]) (const_int 7 [0x7]) (const_int 9 [0x9]) (const_int 11 [0xb]) (const_int 13 [0xd]) (const_int 15 [0xf]) ]))))
Similarly for the accumulating forms where there's an extra outer PLUS for the accumulation. We already have the handy helper functions aarch64_stepped_int_parallel_p and aarch64_gen_stepped_int_parallel defined in aarch64.cc that we can make use of to define the right predicate for the VEC_SELECT PARALLEL. This patch allows us to remove some code iterators and the UNSPEC definitions for SADDLP and UADDLP. UNSPEC_UADALP and UNSPEC_SADALP are retained because they are used by SVE2 patterns still. Bootstrapped and tested on aarch64-none-linux-gnu and aarch64_be-none-elf. Pushing to trunk. Thanks, Kyrill gcc/ChangeLog: * config/aarch64/aarch64-simd.md (aarch64_<sur>adalp<mode>): Delete. (aarch64_<su>adalp<mode>): New define_expand. (*aarch64_<su>adalp<mode><vczle><vczbe>_insn): New define_insn. (aarch64_<su>addlp<mode>): Convert to define_expand. (*aarch64_<su>addlp<mode><vczle><vczbe>_insn): New define_insn. * config/aarch64/iterators.md (UNSPEC_SADDLP, UNSPEC_UADDLP): Delete. (ADALP): Likewise. (USADDLP): Likewise. * config/aarch64/predicates.md (vect_par_cnst_even_or_odd_half): Define.
adalp.patch
Description: adalp.patch