On Wed, Mar 18, 2020 at 11:39 AM Richard Biener <richard.guent...@gmail.com> wrote: > > On Wed, Mar 18, 2020 at 11:06 AM Kewen.Lin <li...@linux.ibm.com> wrote: > > > > Hi, > > > > As PR90332 shows, the current scalar epilogue peeling for gaps > > elimination requires expected vec_init optab with two half size > > vector mode. On Power, we don't support vector mode like V8QI, > > so can't support optab like vec_initv16qiv8qi. But we want to > > leverage existing scalar mode like DI to init the desirable > > vector mode. This patch is to extend the existing support for > > Power, as evaluated on Power9 we can see expected 1.9% speed up > > on SPEC2017 525.x264_r. > > > > Bootstrapped/regtested on powerpc64le-linux-gnu (LE) P8 and P9. > > > > Is it ok for trunk? > > There's already code exercising such a case in vectorizable_load > (VMAT_STRIDED_SLP) which you could have factored out. > > vectype, bool slp, > than the alignment boundary B. Every vector access will > be a multiple of B and so we are guaranteed to access a > non-gap element in the same B-sized block. */ > + machine_mode half_mode; > if (overrun_p > && gap < (vect_known_alignment_in_bytes (first_dr_info) > / vect_get_scalar_dr_size (first_dr_info))) > - overrun_p = false; > - > + { > + overrun_p = false; > + if (known_eq (nunits, (group_size - gap) * 2) > + && known_eq (nunits, group_size) > + && get_half_mode_for_vector (vectype, &half_mode)) > + DR_GROUP_HALF_MODE (first_stmt_info) = half_mode; > + } > > why do you need to amend this case? > > I don't like storing DR_GROUP_HALF_MODE very much, later > you need a vector type and it looks cheap enough to recompute > it where you need it? Iff then it doesn't belong to DR_GROUP > but to the stmt-info. > > I realize the original optimization was kind of a hack (and I was too > lazy to implement the integer mode construction path ...). > > So, can you factor out the existing code into a function returning > the vector type for construction for a vector type and a > pieces size? So for V16QI and a pieces-size of 4 we'd > get either V16QI back (then construction from V4QI pieces > should work) or V4SI (then construction from SImode pieces > should work)? Eventually as secondary output provide that > piece type (SI / V4QI).
Btw, why not implement the neccessary vector init patterns? > Thanks, > Richard. > > > BR, > > Kewen > > ----------- > > > > gcc/ChangeLog > > > > 2020-MM-DD Kewen Lin <li...@gcc.gnu.org> > > > > PR tree-optimization/90332 > > * gcc/tree-vectorizer.h (struct _stmt_vec_info): Add half_mode > > field. > > (DR_GROUP_HALF_MODE): New macro. > > * gcc/tree-vect-stmts.c (get_half_mode_for_vector): New function. > > (get_group_load_store_type): Call get_half_mode_for_vector to query > > target > > whether support half size mode and update DR_GROUP_HALF_MODE if yes. > > (vectorizable_load): Build appropriate vector type based on > > DR_GROUP_HALF_MODE.