On Tue, Jul 19, 2011 at 8:44 AM, Ira Rosen <ira.ro...@linaro.org> wrote:
> Hi,
>
> This patch tries to reduce over-promotion of vector operations that
> could be done with narrower elements, e.g., for
>
> char a;
> int b, c;
> short d;
>
> b = (int) a;
> c = b << 2;
> d = (short) c;
>
> we currently produce six vec_unpack_lo/hi_expr statements for
> char->int conversion and then two vec_pack_trunc_expr for short->int.
> While the shift can be performed on short, using only two
> vec_unpack_lo/hi_expr operations for char->short conversion in this
> example.
>
> With this patch we detect such over-promoted sequences that start with
> a type promotion operation and end with a type demotion operation. The
> statements in between are checked if they can be performed using
> smaller type (this patch only adds a support for shifts and bit
> operations with a constant). If a sequence is detected we create a
> sequence of scalar pattern statements to be vectorized instead the
> original one.  Since there may be two pattern statements created for
> the same original statement - the operation itself (on an intermediate
> type) and a type promotion (from a smaller type to the intermediate
> type) for the non-constant operand - this patch adds a new field to
> struct _stmt_vec_info to keep that pattern def statement.
>
> Bootstrapped and tested on powerpc64-suse-linux.
> Comments are welcome.

I wonder if we should do this optimization for scalars as well.  We still
do some sort of that in frontends shorten_* functions and I added
the capability to remove intermediate conversions to VRP recently.

At least it looks like VRP could be a good place to re-write operations
in narrower types.  That is, for a truncation statement

 d = (short) c;

see if that truncation is value-preserving by looking at the value-range
of C, then look if all related defs of C can be rewritten to that truncated
type until you reach only stmts that need no further processing
(not sure if that might be too expensive - at least I could imagine
some artificial testcases that would exhibit quadratic behavior).

You'd need to make VRP handle new SSA names during substitue_and_fold
gracefully.

Thanks,
Richard.

> Thanks,
> Ira
>
> ChangeLog:
>
>   * tree-vectorizer.h (struct _stmt_vec_info): Add new field for
>   pattern def statement, and its access macro.
>   (NUM_PATTERNS): Set to 5.
>   * tree-vect-loop.c (vect_determine_vectorization_factor): Handle
>   pattern def statement.
>   (vect_transform_loop): Likewise.
>   * tree-vect-patterns.c (vect_vect_recog_func_ptrs): Add new
>   function vect_recog_over_widening_pattern ().
>   (vect_operation_fits_smaller_type): New function.
>   (vect_recog_over_widening_pattern, vect_mark_pattern_stmts):
>   Likewise.
>   (vect_pattern_recog_1): Move the code that marks pattern
>   statements to vect_mark_pattern_stmts (), and call it.  Update
>   documentation.
>   * tree-vect-stmts.c (vect_supportable_shift): New function.
>   (vect_analyze_stmt): Handle pattern def statement.
>   (new_stmt_vec_info): Initialize pattern def statement.
>
> testsuite/ChangeLog:
>
>   * gcc.dg/vect/vect-over-widen-1.c: New test.
>   * gcc.dg/vect/vect-over-widen-2.c: New test.
>   * gcc.dg/vect/vect-over-widen-3.c: New test.
>   * gcc.dg/vect/vect-over-widen-4.c: New test.
>

Reply via email to