On Tue, Jul 19, 2011 at 8:44 AM, Ira Rosen <ira.ro...@linaro.org> wrote: > Hi, > > This patch tries to reduce over-promotion of vector operations that > could be done with narrower elements, e.g., for > > char a; > int b, c; > short d; > > b = (int) a; > c = b << 2; > d = (short) c; > > we currently produce six vec_unpack_lo/hi_expr statements for > char->int conversion and then two vec_pack_trunc_expr for short->int. > While the shift can be performed on short, using only two > vec_unpack_lo/hi_expr operations for char->short conversion in this > example. > > With this patch we detect such over-promoted sequences that start with > a type promotion operation and end with a type demotion operation. The > statements in between are checked if they can be performed using > smaller type (this patch only adds a support for shifts and bit > operations with a constant). If a sequence is detected we create a > sequence of scalar pattern statements to be vectorized instead the > original one. Since there may be two pattern statements created for > the same original statement - the operation itself (on an intermediate > type) and a type promotion (from a smaller type to the intermediate > type) for the non-constant operand - this patch adds a new field to > struct _stmt_vec_info to keep that pattern def statement. > > Bootstrapped and tested on powerpc64-suse-linux. > Comments are welcome.
I wonder if we should do this optimization for scalars as well. We still do some sort of that in frontends shorten_* functions and I added the capability to remove intermediate conversions to VRP recently. At least it looks like VRP could be a good place to re-write operations in narrower types. That is, for a truncation statement d = (short) c; see if that truncation is value-preserving by looking at the value-range of C, then look if all related defs of C can be rewritten to that truncated type until you reach only stmts that need no further processing (not sure if that might be too expensive - at least I could imagine some artificial testcases that would exhibit quadratic behavior). You'd need to make VRP handle new SSA names during substitue_and_fold gracefully. Thanks, Richard. > Thanks, > Ira > > ChangeLog: > > * tree-vectorizer.h (struct _stmt_vec_info): Add new field for > pattern def statement, and its access macro. > (NUM_PATTERNS): Set to 5. > * tree-vect-loop.c (vect_determine_vectorization_factor): Handle > pattern def statement. > (vect_transform_loop): Likewise. > * tree-vect-patterns.c (vect_vect_recog_func_ptrs): Add new > function vect_recog_over_widening_pattern (). > (vect_operation_fits_smaller_type): New function. > (vect_recog_over_widening_pattern, vect_mark_pattern_stmts): > Likewise. > (vect_pattern_recog_1): Move the code that marks pattern > statements to vect_mark_pattern_stmts (), and call it. Update > documentation. > * tree-vect-stmts.c (vect_supportable_shift): New function. > (vect_analyze_stmt): Handle pattern def statement. > (new_stmt_vec_info): Initialize pattern def statement. > > testsuite/ChangeLog: > > * gcc.dg/vect/vect-over-widen-1.c: New test. > * gcc.dg/vect/vect-over-widen-2.c: New test. > * gcc.dg/vect/vect-over-widen-3.c: New test. > * gcc.dg/vect/vect-over-widen-4.c: New test. >