On Fri, Dec 23, 2022 at 12:19 AM Roger Sayle <ro...@nextmovesoftware.com> wrote: > > > This patch enhances x86's STV pass to handle VEC_SELECT during general > scalar chain conversion, performing SImode scalar extraction from V4SI > and DImode scalar extraction from V2DI vector registers. > > The motivating test case from bugzilla is: > > typedef unsigned int v4si __attribute__((vector_size(16))); > > unsigned int f (v4si a, v4si b) > { > a[0] += b[0]; > return a[0] + a[1]; > } > > currently with -O2 -march=znver2 this generates: > > vpextrd $1, %xmm0, %edx > vmovd %xmm0, %eax > addl %edx, %eax > vmovd %xmm1, %edx > addl %edx, %eax > ret > > which performs three transfers from the vector unit to the scalar unit, > and performs the two additions there. With this patch, we now generate: > > vmovdqa %xmm0, %xmm2 > vpshufd $85, %xmm0, %xmm0 > vpaddd %xmm0, %xmm2, %xmm0 > vpaddd %xmm1, %xmm0, %xmm0 > vmovd %xmm0, %eax > ret > > which performs the two additions in the vector unit, and then transfers > the result to the scalar unit. Technically the (cheap) movdqa isn't > needed with better register allocation (or this could be cleaned up > during peephole2), but even so this transform is still a win. > > This patch has been tested on x86_64-pc-linux-gnu with make bootstrap > and make -k check, both with and without --target_board=unix{-m32}, > with no new failures. Ok for mainline? > > > 2022-12-22 Roger Sayle <ro...@nextmovesoftware.com> > > gcc/ChangeLog > PR target/107548 > * config/i386/i386-features.cc (scalar_chain::add_insn): The > operands of a VEC_SELECT don't need to added to the scalar chain. > (general_scalar_chain::compute_convert_gain) <case VEC_SELECT>: > Provide gains for performing STV on a VEC_SELECT. > (general_scalar_chain::convert_insn): Convert VEC_SELECT to pshufd, > psrldq or no-op. > (general_scalar_to_vector_candidate_p): Handle VEC_SELECT of a > single element from a vector register to a scalar register. > > gcc/testsuite/ChangeLog > PR target/107548 > * gcc.target/i386/pr107548-1.c: New test V4SI case. > * gcc.target/i386/pr107548-1.c: New test V2DI case.
LGTM. Thanks, Uros.