Re: [14/n] PR85694: Rework overwidening detection

Richard Sandiford Wed, 04 Jul 2018 00:19:30 -0700

Christophe Lyon <christophe.l...@linaro.org> writes:
> On Tue, 3 Jul 2018 at 12:02, Richard Sandiford
> <richard.sandif...@arm.com> wrote:
>>
>> Richard Biener <richard.guent...@gmail.com> writes:
>> > On Fri, Jun 29, 2018 at 1:36 PM Richard Sandiford
>> > <richard.sandif...@arm.com> wrote:
>> >>
>> >> Richard Sandiford <richard.sandif...@arm.com> writes:
>> >> > This patch is the main part of PR85694.  The aim is to recognise
> at least:
>> >> >
>> >> >   signed char *a, *b, *c;
>> >> >   ...
>> >> >   for (int i = 0; i < 2048; i++)
>> >> >     c[i] = (a[i] + b[i]) >> 1;
>> >> >
>> >> > as an over-widening pattern, since the addition and shift can be done
>> >> > on shorts rather than ints.  However, it ended up being a lot more
>> >> > general than that.
>> >> >
>> >> > The current over-widening pattern detection is limited to a few simple
>> >> > cases: logical ops with immediate second operands, and shifts by a
>> >> > constant.  These cases are enough for common pixel-format conversion
>> >> > and can be detected in a peephole way.
>> >> >
>> >> > The loop above requires two generalisations of the current code: support
>> >> > for addition as well as logical ops, and support for non-constant second
>> >> > operands.  These are harder to detect in the same peephole way, so the
>> >> > patch tries to take a more global approach.
>> >> >
>> >> > The idea is to get information about the minimum operation width
>> >> > in two ways:
>> >> >
>> >> > (1) by using the range information attached to the SSA_NAMEs
>> >> >     (effectively a forward walk, since the range info is
>> >> >     context-independent).
>> >> >
>> >> > (2) by back-propagating the number of output bits required by
>> >> >     users of the result.
>> >> >
>> >> > As explained in the comments, there's a balance to be struck between
>> >> > narrowing an individual operation and fitting in with the surrounding
>> >> > code.  The approach is pretty conservative: if we could narrow an
>> >> > operation to N bits without changing its semantics, it's OK to do
> that if:
>> >> >
>> >> > - no operations later in the chain require more than N bits; or
>> >> >
>> >> > - all internally-defined inputs are extended from N bits or fewer,
>> >> >   and at least one of them is single-use.
>> >> >
>> >> > See the comments for the rationale.
>> >> >
>> >> > I didn't bother adding STMT_VINFO_* wrappers for the new fields
>> >> > since the code seemed more readable without.
>> >> >
>> >> > Tested on aarch64-linux-gnu and x86_64-linux-gnu.  OK to install?
>> >>
>> >> Here's a version rebased on top of current trunk.  Changes from last time:
>> >>
>> >> - reintroduce dump_generic_expr_loc, with the obvious change to the
>> >>   prototype
>> >>
>> >> - fix a typo in a comment
>> >>
>> >> - use vect_element_precision from the new version of 12/n.
>> >>
>> >> Tested as before.  OK to install?
>> >
>> > OK.
>>
>> Thanks.  For the record, here's what I installed (updated on top of
>> Dave's recent patch, and with an obvious fix to vect-widen-mult-u8-u32.c).
>>
>> Richard
>>
> Hi,
>
> It seems the new bb-slp-over-widen tests lack a -fdump option:
> gcc.dg/vect/bb-slp-over-widen-2.c -flto -ffat-lto-objects : dump file
> does not exist
> UNRESOLVED: gcc.dg/vect/bb-slp-over-widen-2.c -flto -ffat-lto-objects
> scan-tree-dump-times vect "basic block vectorized" 2


I've applied the following as obvious.

Richard


2018-07-04  Richard Sandiford  <richard.sandif...@arm.com>

gcc/testsuite/
        * gcc.dg/vect/bb-slp-over-widen-1.c: Fix name of dump file for
        final scan test.
        * gcc.dg/vect/bb-slp-over-widen-2.c: Likewise.

Index: gcc/testsuite/gcc.dg/vect/bb-slp-over-widen-1.c
===================================================================
--- gcc/testsuite/gcc.dg/vect/bb-slp-over-widen-1.c     2018-07-03 
10:59:30.480481417 +0100
+++ gcc/testsuite/gcc.dg/vect/bb-slp-over-widen-1.c     2018-07-04 
08:16:36.210113069 +0100
@@ -63,4 +63,4 @@ main (void)
 
 /* { dg-final { scan-tree-dump "demoting int to signed short" "slp2" { target 
{ ! vect_widen_shift } } } } */
 /* { dg-final { scan-tree-dump "demoting int to unsigned short" "slp2" { 
target { ! vect_widen_shift } } } } */
-/* { dg-final { scan-tree-dump-times "basic block vectorized" 2 "vect" } } */
+/* { dg-final { scan-tree-dump-times "basic block vectorized" 2 "slp2" } } */
Index: gcc/testsuite/gcc.dg/vect/bb-slp-over-widen-2.c
===================================================================
--- gcc/testsuite/gcc.dg/vect/bb-slp-over-widen-2.c     2018-07-03 
10:59:30.480481417 +0100
+++ gcc/testsuite/gcc.dg/vect/bb-slp-over-widen-2.c     2018-07-04 
08:16:36.210113069 +0100
@@ -62,4 +62,4 @@ main (void)
 
 /* { dg-final { scan-tree-dump "demoting int to signed short" "slp2" { target 
{ ! vect_widen_shift } } } } */
 /* { dg-final { scan-tree-dump "demoting int to unsigned short" "slp2" { 
target { ! vect_widen_shift } } } } */
-/* { dg-final { scan-tree-dump-times "basic block vectorized" 2 "vect" } } */
+/* { dg-final { scan-tree-dump-times "basic block vectorized" 2 "slp2" } } */

Re: [14/n] PR85694: Rework overwidening detection

Reply via email to