On Thu, 21 Sep 2017, Jon Beniston wrote: > Hi, > > The GCC vectorizer can't vectorize the following loop even though the target > supports 2-lane SIMD left shift. > > short a[256], b[256]; > foo () > { > int i; > for (i=0; i<256; i++) > { a[i] = b[i] << 4; } > } > > The reason seems to be GCC is promoting the source from short to int, then > performing left shift on int type and finally a type demotion is done to > covert it back to short. Below is the related tree dump: > > _2 = (intD.1) _1; > # RANGE [-524288, 524272] NONZERO 4294967280 > _3 = _2 << 4; > # RANGE [-32768, 32767] NONZERO 65520 > _4 = (short intD.10) _3; > # .MEM_8 = VDEF <.MEM_14> > aD.1888[i_13] = _4; > > I checked tree-vect-patterns.c and found there is a pattern recognizer > "vect_recog_over_widening_pattern" to recognize such sequences already. > > But, in vect_operation_fits_smaller_type, it only recognizes the sequences > when the promoted type is 4 times wider than the original type. The reason > seems to be the original proposal at: > > https://gcc.gnu.org/ml/gcc-patches/2011-07/msg01472.html > > is to handle the following sequences where three types are involved, and the > width, T_PROMOTED = 2 * T_INTER = 4 * T_ORIG. > > T_ORIG a; > T_PROMOTED b, c; > T_INTER d; > > b = (T_PROMOTED) a; > c = b << 2; > d = (T_INTER) c; > > While we could also handle the following sequence where only two types are > involved, and T_PROMOTED = 2 * T_ORIG > > T_ORIG a; > T_PROMOTED b, c, d; > > b = (T_PROMOTED) a; > c = b << 2; > d = (T_ORIG) c; > > Performing the left shift on T_ORIG directly should be equal to performing > it on T_PROMOTED then converting back to T_ORIG. > > x86-64/AArch64/PPC64 bootstrap OK (finished on gcc farms) and no regression > on check-gcc/g++. > > gcc/ > 2017-09-21 Jon Beniston <j...@beniston.com> > > * tree-vect-patterns.c (vect_opertion_fits_smaller_type): Allow > half_type for LSHIFT_EXPR. > > diff --git a/gcc/tree-vect-patterns.c b/gcc/tree-vect-patterns.c > index cdad261..0abf37c 100644 > --- a/gcc/tree-vect-patterns.c > +++ b/gcc/tree-vect-patterns.c > @@ -1318,7 +1318,12 @@ vect_operation_fits_smaller_type (gimple *stmt, tree > def, tree *new_type, > break; > > case LSHIFT_EXPR: > - /* Try intermediate type - HALF_TYPE is not enough for sure. */ > + /* Try half_type. */ > + if (TYPE_PRECISION (type) == TYPE_PRECISION (half_type) * 2 > + && vect_supportable_shift (code, half_type)) > + break; > + > + /* Try intermediate type. */ > if (TYPE_PRECISION (type) < (TYPE_PRECISION (half_type) * 4)) > return false;
Not digged long into this "interesting" function but this case is only valid if type == final type and if the result is not shifted back. vect_recog_over_widening_pattern works on a whole sequence of stmts after all, thus b = (T_PROMOTED) a; c = b << 2; d = b >> 2; e = (T_ORIG) b; would be miscompiled by your new case. Richard.