On Mon, 2 Jul 2018 at 15:37, Richard Sandiford <richard.sandif...@arm.com> wrote: > > Christophe Lyon <christophe.l...@linaro.org> writes: > > On Fri, 29 Jun 2018 at 13:36, Richard Sandiford > > <richard.sandif...@arm.com> wrote: > >> > >> Richard Sandiford <richard.sandif...@arm.com> writes: > >> > This patch is the main part of PR85694. The aim is to recognise at > >> > least: > >> > > >> > signed char *a, *b, *c; > >> > ... > >> > for (int i = 0; i < 2048; i++) > >> > c[i] = (a[i] + b[i]) >> 1; > >> > > >> > as an over-widening pattern, since the addition and shift can be done > >> > on shorts rather than ints. However, it ended up being a lot more > >> > general than that. > >> > > >> > The current over-widening pattern detection is limited to a few simple > >> > cases: logical ops with immediate second operands, and shifts by a > >> > constant. These cases are enough for common pixel-format conversion > >> > and can be detected in a peephole way. > >> > > >> > The loop above requires two generalisations of the current code: support > >> > for addition as well as logical ops, and support for non-constant second > >> > operands. These are harder to detect in the same peephole way, so the > >> > patch tries to take a more global approach. > >> > > >> > The idea is to get information about the minimum operation width > >> > in two ways: > >> > > >> > (1) by using the range information attached to the SSA_NAMEs > >> > (effectively a forward walk, since the range info is > >> > context-independent). > >> > > >> > (2) by back-propagating the number of output bits required by > >> > users of the result. > >> > > >> > As explained in the comments, there's a balance to be struck between > >> > narrowing an individual operation and fitting in with the surrounding > >> > code. The approach is pretty conservative: if we could narrow an > >> > operation to N bits without changing its semantics, it's OK to do that > >> > if: > >> > > >> > - no operations later in the chain require more than N bits; or > >> > > >> > - all internally-defined inputs are extended from N bits or fewer, > >> > and at least one of them is single-use. > >> > > >> > See the comments for the rationale. > >> > > >> > I didn't bother adding STMT_VINFO_* wrappers for the new fields > >> > since the code seemed more readable without. > >> > > >> > Tested on aarch64-linux-gnu and x86_64-linux-gnu. OK to install? > >> > >> Here's a version rebased on top of current trunk. Changes from last time: > >> > >> - reintroduce dump_generic_expr_loc, with the obvious change to the > >> prototype > >> > >> - fix a typo in a comment > >> > >> - use vect_element_precision from the new version of 12/n. > >> > >> Tested as before. OK to install? > >> > > > > Hi Richard, > > > > This patch introduces regressions on arm-none-linux-gnueabihf: > > gcc.dg/vect/vect-over-widen-1-big-array.c -flto -ffat-lto-objects > > scan-tree-dump-times vect "vect_recog_widen_shift_pattern: detected" 2 > > gcc.dg/vect/vect-over-widen-1-big-array.c scan-tree-dump-times > > vect "vect_recog_widen_shift_pattern: detected" 2 > > gcc.dg/vect/vect-over-widen-1.c -flto -ffat-lto-objects > > scan-tree-dump-times vect "vect_recog_widen_shift_pattern: detected" 2 > > gcc.dg/vect/vect-over-widen-1.c scan-tree-dump-times vect > > "vect_recog_widen_shift_pattern: detected" 2 > > gcc.dg/vect/vect-over-widen-4-big-array.c -flto -ffat-lto-objects > > scan-tree-dump-times vect "vect_recog_widen_shift_pattern: detected" 2 > > gcc.dg/vect/vect-over-widen-4-big-array.c scan-tree-dump-times > > vect "vect_recog_widen_shift_pattern: detected" 2 > > gcc.dg/vect/vect-over-widen-4.c -flto -ffat-lto-objects > > scan-tree-dump-times vect "vect_recog_widen_shift_pattern: detected" 2 > > gcc.dg/vect/vect-over-widen-4.c scan-tree-dump-times vect > > "vect_recog_widen_shift_pattern: detected" 2 > > gcc.dg/vect/vect-widen-shift-s16.c -flto -ffat-lto-objects > > scan-tree-dump-times vect "vect_recog_widen_shift_pattern: detected" 8 > > gcc.dg/vect/vect-widen-shift-s16.c scan-tree-dump-times vect > > "vect_recog_widen_shift_pattern: detected" 8 > > gcc.dg/vect/vect-widen-shift-s8.c -flto -ffat-lto-objects > > scan-tree-dump-times vect "vect_recog_widen_shift_pattern: detected" 1 > > gcc.dg/vect/vect-widen-shift-s8.c scan-tree-dump-times vect > > "vect_recog_widen_shift_pattern: detected" 1 > > gcc.dg/vect/vect-widen-shift-u16.c -flto -ffat-lto-objects > > scan-tree-dump-times vect "vect_recog_widen_shift_pattern: detected" 1 > > gcc.dg/vect/vect-widen-shift-u16.c scan-tree-dump-times vect > > "vect_recog_widen_shift_pattern: detected" 1 > > gcc.dg/vect/vect-widen-shift-u8.c -flto -ffat-lto-objects > > scan-tree-dump-times vect "vect_recog_widen_shift_pattern: detected" 2 > > gcc.dg/vect/vect-widen-shift-u8.c scan-tree-dump-times vect > > "vect_recog_widen_shift_pattern: detected" 2 > > Sorry about that, was caused by a stupid typo. I've applied the > below as obvious. > > (For the record, it was actually 12/n that caused this. 14/n hasn't > been applied yet.) > Sorry about the confusion, I probably messed up in gmail when searching for the mail containing the patch that caused the regression.
> Thanks, > Richard > > > 2018-07-02 Richard Sandiford <richard.sandif...@arm.com> > > gcc/ > * tree-vect-patterns.c (vect_recog_widen_shift_pattern): Fix typo > in dump string. > > Index: gcc/tree-vect-patterns.c > =================================================================== > --- gcc/tree-vect-patterns.c 2018-07-02 14:30:57.000000000 +0100 > +++ gcc/tree-vect-patterns.c 2018-07-02 14:30:57.383750450 +0100 > @@ -1739,7 +1739,7 @@ vect_recog_widen_shift_pattern (vec<gimp > { > return vect_recog_widen_op_pattern (stmts, type_out, LSHIFT_EXPR, > WIDEN_LSHIFT_EXPR, true, > - "vect_widen_shift_pattern"); > + "vect_recog_widen_shift_pattern"); > } > > /* Detect a rotate pattern wouldn't be otherwise vectorized: