On Wed, Aug 11, 2021 at 3:58 PM Jakub Jelinek <ja...@redhat.com> wrote:
>
> On Wed, Aug 11, 2021 at 02:43:06PM +0800, liuhongt wrote:
> >   Add define_insn_and_split to combine avx_vec_concatv16si/2 and
> > avx512f_zero_extendv16hiv16si2_1 since the latter already zero_extend
> > the upper bits, similar for other patterns which are related to
> > pmovzx{bw,wd,dq}.
> >
> > It will do optimization like
> >
> > -       vmovdqa %ymm0, %ymm0    # 7     [c=4 l=6]  avx_vec_concatv16si/2
> >         vpmovzxwd       %ymm0, %zmm0    # 22    [c=4 l=6]  
> > avx512f_zero_extendv16hiv16si2
> >         ret             # 25    [c=0 l=1]  simple_return_internal
> >
> >   Bootstrapped and regtested on x86_64-linux-gnu{-m32,}.
> >   Ok for trunk?
> >
> > gcc/ChangeLog:
> >
> >       PR target/101846
> >       * config/i386/sse.md (*avx2_zero_extendv16qiv16hi2_2): New
> >       post_reload define_insn_and_split.
>
> The ChangeLog doesn't mention the newly added mode iterators, perhaps it
> should.
>
> >       (*avx512bw_zero_extendv32qiv32hi2_2): Ditto.
> >       (*sse4_1_zero_extendv8qiv8hi2_4): Ditto.
> >       (*avx512f_zero_extendv16hiv16si2_2): Ditto.
> >       (*avx2_zero_extendv8hiv8si2_2): Ditto.
> >       (*sse4_1_zero_extendv4hiv4si2_4): Ditto.
> >       (*avx512f_zero_extendv8siv8di2_2): Ditto.
> >       (*avx2_zero_extendv4siv4di2_2): Ditto.
> >       (*sse4_1_zero_extendv2siv2di2_4): Ditto.
> >
> > gcc/testsuite/ChangeLog:
> >
> >       PR target/101846
> >       * gcc.target/i386/pr101846-1.c: New test.
> > ---
> >  gcc/config/i386/sse.md                     | 220 +++++++++++++++++++++
> >  gcc/testsuite/gcc.target/i386/pr101846-1.c |  95 +++++++++
> >  2 files changed, 315 insertions(+)
> >  create mode 100644 gcc/testsuite/gcc.target/i386/pr101846-1.c
> >
> > diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md
> > index a46a2373547..6450c058458 100644
> > --- a/gcc/config/i386/sse.md
> > +++ b/gcc/config/i386/sse.md
> > @@ -673,8 +673,14 @@ (define_mode_iterator VI12_128 [V16QI V8HI])
> >  (define_mode_iterator VI14_128 [V16QI V4SI])
> >  (define_mode_iterator VI124_128 [V16QI V8HI V4SI])
> >  (define_mode_iterator VI24_128 [V8HI V4SI])
> > +(define_mode_iterator VI128_128 [V16QI V8HI V2DI])
>
> And this mode iterator isn't used anywhere in the patch it seems.
>
> Otherwise LGTM, although it fixes just particular, though perhaps very
> important, cases, for detecting generally that some operations on
> a vector aren't needed because following permutation that uses it never
> reads those elements is something that would need to be done on gimple.
>
> Would it be possible to handle also the similar pmovzx{bd,wq,bq} cases?
Yes, regarding testcase bar, vec_perm can be implemented as vpmovdw
and vinserti64x4, and the latter instructions will be optimized off
since the upper bits are never used.
I'm working on a patch.
>
>         Jakub
>


-- 
BR,
Hongtao

Reply via email to