On Fri, Dec 23, 2022 at 12:19 AM Roger Sayle <ro...@nextmovesoftware.com> wrote:
>
>
> This patch enhances x86's STV pass to handle VEC_SELECT during general
> scalar chain conversion, performing SImode scalar extraction from V4SI
> and DImode scalar extraction from V2DI vector registers.
>
> The motivating test case from bugzilla is:
>
> typedef unsigned int v4si __attribute__((vector_size(16)));
>
> unsigned int f (v4si a, v4si b)
> {
>   a[0] += b[0];
>   return a[0] + a[1];
> }
>
> currently with -O2 -march=znver2 this generates:
>
>         vpextrd $1, %xmm0, %edx
>         vmovd   %xmm0, %eax
>         addl    %edx, %eax
>         vmovd   %xmm1, %edx
>         addl    %edx, %eax
>         ret
>
> which performs three transfers from the vector unit to the scalar unit,
> and performs the two additions there.  With this patch, we now generate:
>
>         vmovdqa %xmm0, %xmm2
>         vpshufd $85, %xmm0, %xmm0
>         vpaddd  %xmm0, %xmm2, %xmm0
>         vpaddd  %xmm1, %xmm0, %xmm0
>         vmovd   %xmm0, %eax
>         ret
>
> which performs the two additions in the vector unit, and then transfers
> the result to the scalar unit.  Technically the (cheap) movdqa isn't
> needed with better register allocation (or this could be cleaned up
> during peephole2), but even so this transform is still a win.
>
> This patch has been tested on x86_64-pc-linux-gnu with make bootstrap
> and make -k check, both with and without --target_board=unix{-m32},
> with no new failures.  Ok for mainline?
>
>
> 2022-12-22  Roger Sayle  <ro...@nextmovesoftware.com>
>
> gcc/ChangeLog
>         PR target/107548
>         * config/i386/i386-features.cc (scalar_chain::add_insn): The
>         operands of a VEC_SELECT don't need to added to the scalar chain.
>         (general_scalar_chain::compute_convert_gain) <case VEC_SELECT>:
>         Provide gains for performing STV on a VEC_SELECT.
>         (general_scalar_chain::convert_insn): Convert VEC_SELECT to pshufd,
>         psrldq or no-op.
>         (general_scalar_to_vector_candidate_p): Handle VEC_SELECT of a
>         single element from a vector register to a scalar register.
>
> gcc/testsuite/ChangeLog
>         PR target/107548
>         * gcc.target/i386/pr107548-1.c: New test V4SI case.
>         * gcc.target/i386/pr107548-1.c: New test V2DI case.

LGTM.

Thanks,
Uros.

Reply via email to