On Wed, Mar 18, 2020 at 11:39 AM Richard Biener
<richard.guent...@gmail.com> wrote:
>
> On Wed, Mar 18, 2020 at 11:06 AM Kewen.Lin <li...@linux.ibm.com> wrote:
> >
> > Hi,
> >
> > As PR90332 shows, the current scalar epilogue peeling for gaps
> > elimination requires expected vec_init optab with two half size
> > vector mode.  On Power, we don't support vector mode like V8QI,
> > so can't support optab like vec_initv16qiv8qi.  But we want to
> > leverage existing scalar mode like DI to init the desirable
> > vector mode.  This patch is to extend the existing support for
> > Power, as evaluated on Power9 we can see expected 1.9% speed up
> > on SPEC2017 525.x264_r.
> >
> > Bootstrapped/regtested on powerpc64le-linux-gnu (LE) P8 and P9.
> >
> > Is it ok for trunk?
>
> There's already code exercising such a case in vectorizable_load
> (VMAT_STRIDED_SLP) which you could have factored out.
>
>  vectype, bool slp,
>              than the alignment boundary B.  Every vector access will
>              be a multiple of B and so we are guaranteed to access a
>              non-gap element in the same B-sized block.  */
> +         machine_mode half_mode;
>           if (overrun_p
>               && gap < (vect_known_alignment_in_bytes (first_dr_info)
>                         / vect_get_scalar_dr_size (first_dr_info)))
> -           overrun_p = false;
> -
> +           {
> +             overrun_p = false;
> +             if (known_eq (nunits, (group_size - gap) * 2)
> +                 && known_eq (nunits, group_size)
> +                 && get_half_mode_for_vector (vectype, &half_mode))
> +               DR_GROUP_HALF_MODE (first_stmt_info) = half_mode;
> +           }
>
> why do you need to amend this case?
>
> I don't like storing DR_GROUP_HALF_MODE very much, later
> you need a vector type and it looks cheap enough to recompute
> it where you need it?  Iff then it doesn't belong to DR_GROUP
> but to the stmt-info.
>
> I realize the original optimization was kind of a hack (and I was too
> lazy to implement the integer mode construction path ...).
>
> So, can you factor out the existing code into a function returning
> the vector type for construction for a vector type and a
> pieces size?  So for V16QI and a pieces-size of 4 we'd
> get either V16QI back (then construction from V4QI pieces
> should work) or V4SI (then construction from SImode pieces
> should work)?  Eventually as secondary output provide that
> piece type (SI / V4QI).

Btw, why not implement the neccessary vector init patterns?

> Thanks,
> Richard.
>
> > BR,
> > Kewen
> > -----------
> >
> > gcc/ChangeLog
> >
> > 2020-MM-DD  Kewen Lin  <li...@gcc.gnu.org>
> >
> >         PR tree-optimization/90332
> >         * gcc/tree-vectorizer.h (struct _stmt_vec_info): Add half_mode 
> > field.
> >         (DR_GROUP_HALF_MODE): New macro.
> >         * gcc/tree-vect-stmts.c (get_half_mode_for_vector): New function.
> >         (get_group_load_store_type): Call get_half_mode_for_vector to query 
> > target
> >         whether support half size mode and update DR_GROUP_HALF_MODE if yes.
> >         (vectorizable_load): Build appropriate vector type based on
> >         DR_GROUP_HALF_MODE.

Reply via email to