On Wed, Aug 11, 2021 at 3:58 PM Jakub Jelinek <ja...@redhat.com> wrote: > > On Wed, Aug 11, 2021 at 02:43:06PM +0800, liuhongt wrote: > > Add define_insn_and_split to combine avx_vec_concatv16si/2 and > > avx512f_zero_extendv16hiv16si2_1 since the latter already zero_extend > > the upper bits, similar for other patterns which are related to > > pmovzx{bw,wd,dq}. > > > > It will do optimization like > > > > - vmovdqa %ymm0, %ymm0 # 7 [c=4 l=6] avx_vec_concatv16si/2 > > vpmovzxwd %ymm0, %zmm0 # 22 [c=4 l=6] > > avx512f_zero_extendv16hiv16si2 > > ret # 25 [c=0 l=1] simple_return_internal > > > > Bootstrapped and regtested on x86_64-linux-gnu{-m32,}. > > Ok for trunk? > > > > gcc/ChangeLog: > > > > PR target/101846 > > * config/i386/sse.md (*avx2_zero_extendv16qiv16hi2_2): New > > post_reload define_insn_and_split. > > The ChangeLog doesn't mention the newly added mode iterators, perhaps it > should. > > > (*avx512bw_zero_extendv32qiv32hi2_2): Ditto. > > (*sse4_1_zero_extendv8qiv8hi2_4): Ditto. > > (*avx512f_zero_extendv16hiv16si2_2): Ditto. > > (*avx2_zero_extendv8hiv8si2_2): Ditto. > > (*sse4_1_zero_extendv4hiv4si2_4): Ditto. > > (*avx512f_zero_extendv8siv8di2_2): Ditto. > > (*avx2_zero_extendv4siv4di2_2): Ditto. > > (*sse4_1_zero_extendv2siv2di2_4): Ditto. > > > > gcc/testsuite/ChangeLog: > > > > PR target/101846 > > * gcc.target/i386/pr101846-1.c: New test. > > --- > > gcc/config/i386/sse.md | 220 +++++++++++++++++++++ > > gcc/testsuite/gcc.target/i386/pr101846-1.c | 95 +++++++++ > > 2 files changed, 315 insertions(+) > > create mode 100644 gcc/testsuite/gcc.target/i386/pr101846-1.c > > > > diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md > > index a46a2373547..6450c058458 100644 > > --- a/gcc/config/i386/sse.md > > +++ b/gcc/config/i386/sse.md > > @@ -673,8 +673,14 @@ (define_mode_iterator VI12_128 [V16QI V8HI]) > > (define_mode_iterator VI14_128 [V16QI V4SI]) > > (define_mode_iterator VI124_128 [V16QI V8HI V4SI]) > > (define_mode_iterator VI24_128 [V8HI V4SI]) > > +(define_mode_iterator VI128_128 [V16QI V8HI V2DI]) > > And this mode iterator isn't used anywhere in the patch it seems. > > Otherwise LGTM, although it fixes just particular, though perhaps very > important, cases, for detecting generally that some operations on > a vector aren't needed because following permutation that uses it never > reads those elements is something that would need to be done on gimple. > > Would it be possible to handle also the similar pmovzx{bd,wq,bq} cases? Yes, regarding testcase bar, vec_perm can be implemented as vpmovdw and vinserti64x4, and the latter instructions will be optimized off since the upper bits are never used. I'm working on a patch. > > Jakub >
-- BR, Hongtao