Christophe Lyon <christophe.l...@linaro.org> writes: > On Tue, 3 Jul 2018 at 12:02, Richard Sandiford > <richard.sandif...@arm.com> wrote: >> >> Richard Biener <richard.guent...@gmail.com> writes: >> > On Fri, Jun 29, 2018 at 1:36 PM Richard Sandiford >> > <richard.sandif...@arm.com> wrote: >> >> >> >> Richard Sandiford <richard.sandif...@arm.com> writes: >> >> > This patch is the main part of PR85694. The aim is to recognise > at least: >> >> > >> >> > signed char *a, *b, *c; >> >> > ... >> >> > for (int i = 0; i < 2048; i++) >> >> > c[i] = (a[i] + b[i]) >> 1; >> >> > >> >> > as an over-widening pattern, since the addition and shift can be done >> >> > on shorts rather than ints. However, it ended up being a lot more >> >> > general than that. >> >> > >> >> > The current over-widening pattern detection is limited to a few simple >> >> > cases: logical ops with immediate second operands, and shifts by a >> >> > constant. These cases are enough for common pixel-format conversion >> >> > and can be detected in a peephole way. >> >> > >> >> > The loop above requires two generalisations of the current code: support >> >> > for addition as well as logical ops, and support for non-constant second >> >> > operands. These are harder to detect in the same peephole way, so the >> >> > patch tries to take a more global approach. >> >> > >> >> > The idea is to get information about the minimum operation width >> >> > in two ways: >> >> > >> >> > (1) by using the range information attached to the SSA_NAMEs >> >> > (effectively a forward walk, since the range info is >> >> > context-independent). >> >> > >> >> > (2) by back-propagating the number of output bits required by >> >> > users of the result. >> >> > >> >> > As explained in the comments, there's a balance to be struck between >> >> > narrowing an individual operation and fitting in with the surrounding >> >> > code. The approach is pretty conservative: if we could narrow an >> >> > operation to N bits without changing its semantics, it's OK to do > that if: >> >> > >> >> > - no operations later in the chain require more than N bits; or >> >> > >> >> > - all internally-defined inputs are extended from N bits or fewer, >> >> > and at least one of them is single-use. >> >> > >> >> > See the comments for the rationale. >> >> > >> >> > I didn't bother adding STMT_VINFO_* wrappers for the new fields >> >> > since the code seemed more readable without. >> >> > >> >> > Tested on aarch64-linux-gnu and x86_64-linux-gnu. OK to install? >> >> >> >> Here's a version rebased on top of current trunk. Changes from last time: >> >> >> >> - reintroduce dump_generic_expr_loc, with the obvious change to the >> >> prototype >> >> >> >> - fix a typo in a comment >> >> >> >> - use vect_element_precision from the new version of 12/n. >> >> >> >> Tested as before. OK to install? >> > >> > OK. >> >> Thanks. For the record, here's what I installed (updated on top of >> Dave's recent patch, and with an obvious fix to vect-widen-mult-u8-u32.c). >> >> Richard >> > Hi, > > It seems the new bb-slp-over-widen tests lack a -fdump option: > gcc.dg/vect/bb-slp-over-widen-2.c -flto -ffat-lto-objects : dump file > does not exist > UNRESOLVED: gcc.dg/vect/bb-slp-over-widen-2.c -flto -ffat-lto-objects > scan-tree-dump-times vect "basic block vectorized" 2
I've applied the following as obvious. Richard 2018-07-04 Richard Sandiford <richard.sandif...@arm.com> gcc/testsuite/ * gcc.dg/vect/bb-slp-over-widen-1.c: Fix name of dump file for final scan test. * gcc.dg/vect/bb-slp-over-widen-2.c: Likewise. Index: gcc/testsuite/gcc.dg/vect/bb-slp-over-widen-1.c =================================================================== --- gcc/testsuite/gcc.dg/vect/bb-slp-over-widen-1.c 2018-07-03 10:59:30.480481417 +0100 +++ gcc/testsuite/gcc.dg/vect/bb-slp-over-widen-1.c 2018-07-04 08:16:36.210113069 +0100 @@ -63,4 +63,4 @@ main (void) /* { dg-final { scan-tree-dump "demoting int to signed short" "slp2" { target { ! vect_widen_shift } } } } */ /* { dg-final { scan-tree-dump "demoting int to unsigned short" "slp2" { target { ! vect_widen_shift } } } } */ -/* { dg-final { scan-tree-dump-times "basic block vectorized" 2 "vect" } } */ +/* { dg-final { scan-tree-dump-times "basic block vectorized" 2 "slp2" } } */ Index: gcc/testsuite/gcc.dg/vect/bb-slp-over-widen-2.c =================================================================== --- gcc/testsuite/gcc.dg/vect/bb-slp-over-widen-2.c 2018-07-03 10:59:30.480481417 +0100 +++ gcc/testsuite/gcc.dg/vect/bb-slp-over-widen-2.c 2018-07-04 08:16:36.210113069 +0100 @@ -62,4 +62,4 @@ main (void) /* { dg-final { scan-tree-dump "demoting int to signed short" "slp2" { target { ! vect_widen_shift } } } } */ /* { dg-final { scan-tree-dump "demoting int to unsigned short" "slp2" { target { ! vect_widen_shift } } } } */ -/* { dg-final { scan-tree-dump-times "basic block vectorized" 2 "vect" } } */ +/* { dg-final { scan-tree-dump-times "basic block vectorized" 2 "slp2" } } */