On Wed, Mar 18, 2020 at 11:06 AM Kewen.Lin <li...@linux.ibm.com> wrote: > > Hi, > > As PR90332 shows, the current scalar epilogue peeling for gaps > elimination requires expected vec_init optab with two half size > vector mode. On Power, we don't support vector mode like V8QI, > so can't support optab like vec_initv16qiv8qi. But we want to > leverage existing scalar mode like DI to init the desirable > vector mode. This patch is to extend the existing support for > Power, as evaluated on Power9 we can see expected 1.9% speed up > on SPEC2017 525.x264_r. > > Bootstrapped/regtested on powerpc64le-linux-gnu (LE) P8 and P9. > > Is it ok for trunk?
There's already code exercising such a case in vectorizable_load (VMAT_STRIDED_SLP) which you could have factored out. vectype, bool slp, than the alignment boundary B. Every vector access will be a multiple of B and so we are guaranteed to access a non-gap element in the same B-sized block. */ + machine_mode half_mode; if (overrun_p && gap < (vect_known_alignment_in_bytes (first_dr_info) / vect_get_scalar_dr_size (first_dr_info))) - overrun_p = false; - + { + overrun_p = false; + if (known_eq (nunits, (group_size - gap) * 2) + && known_eq (nunits, group_size) + && get_half_mode_for_vector (vectype, &half_mode)) + DR_GROUP_HALF_MODE (first_stmt_info) = half_mode; + } why do you need to amend this case? I don't like storing DR_GROUP_HALF_MODE very much, later you need a vector type and it looks cheap enough to recompute it where you need it? Iff then it doesn't belong to DR_GROUP but to the stmt-info. I realize the original optimization was kind of a hack (and I was too lazy to implement the integer mode construction path ...). So, can you factor out the existing code into a function returning the vector type for construction for a vector type and a pieces size? So for V16QI and a pieces-size of 4 we'd get either V16QI back (then construction from V4QI pieces should work) or V4SI (then construction from SImode pieces should work)? Eventually as secondary output provide that piece type (SI / V4QI). Thanks, Richard. > BR, > Kewen > ----------- > > gcc/ChangeLog > > 2020-MM-DD Kewen Lin <li...@gcc.gnu.org> > > PR tree-optimization/90332 > * gcc/tree-vectorizer.h (struct _stmt_vec_info): Add half_mode field. > (DR_GROUP_HALF_MODE): New macro. > * gcc/tree-vect-stmts.c (get_half_mode_for_vector): New function. > (get_group_load_store_type): Call get_half_mode_for_vector to query > target > whether support half size mode and update DR_GROUP_HALF_MODE if yes. > (vectorizable_load): Build appropriate vector type based on > DR_GROUP_HALF_MODE.