On Fri, May 17, 2024 at 12:22 PM Richard Biener <rguent...@suse.de> wrote:
>
> On Fri, 17 May 2024, Manolis Tsamis wrote:
>
> > Hi Richard,
> >
> > While I was re-testing the latest version of this patch I noticed that
> > it FAILs an AArch64 test, gcc.target/aarch64/subsp.c. With the patch
> > we generate one instruction more:
> >
> >         sbfiz   x1, x1, 4, 32
> >         stp     x29, x30, [sp, -16]!
> >         add     x1, x1, 16
> >         mov     x29, sp
> >         sub     sp, sp, x1
> >         mov     x0, sp
> >         bl      foo
> >
> > Instead of:
> >
> >         stp     x29, x30, [sp, -16]!
> >         add     w1, w1, 1
> >         mov     x29, sp
> >         sub     sp, sp, w1, sxtw 4
> >         mov     x0, sp
> >         bl      foo
> >
> > I've looked at it but can't really find a way to solve the regression.
> > Any thoughts on this?
>
> Can you explain what goes wrong?  As I said rewriting parts of
> address calculation is tricky, there's always the chance that some
> cases regress (see your observation in comment#4 of the PR).
>

In this case the int -> sizetype cast ends up happening earlier. Instead of

  _7 = y_6(D) + 1;
  _1 = (sizetype) _7;
  _2 = _1 * 16;

We get

  _13 = (sizetype) y_6(D);
  _15 = _13 + 1;
  _2 = _15 * 16;

and then in RTL we have

x1 = ((sizetype) x1) << 4
sp = sp - (x1 + 16)

instead of

x1 = x1 + 1
sp = sp - ((sizetype) x1) << 4

which doesn't form sub sp, sp, w1, sxtw 4.

But more importantly, I realized that (in this case among others) the
pattern is undone by (A * C) +- (B * C) -> (A+-B) * C and (A * C) +- A
-> A * (C+-1). AFAIK having one pattern and its reverse is a bad thing
so something needs to be changed.
One idea could be to only keep the larger one ((T)(A + CST1)) * CST2 +
CST3 -> ((T)(A) * CST2) + ((T)CST1 * CST2 + CST3). it's not enough to
deal with the testcases of the ticket but it does help in other cases.

Manolis

> Note that I still believe that avoiding the early and premature
> promotion of the addition to unsigned is a good thing.
>
> Note the testcase in the PR is fixed with -fwrapv because then
> we do _not_ perform this premature optimization.  Without -fwrapv
> the optimization is valid but as you note we do not perform it
> consistently - otherwise we wouldn't regress.
>
> Richard.
>
>
>
> > Thanks,
> > Manolis
> >
> >
> >
> > On Thu, May 16, 2024 at 11:15 AM Richard Biener
> > <richard.guent...@gmail.com> wrote:
> > >
> > > On Tue, May 14, 2024 at 10:58 AM Manolis Tsamis <manolis.tsa...@vrull.eu> 
> > > wrote:
> > > >
> > > > New patch with the requested changes can be found below.
> > > >
> > > > I don't know how much this affects SCEV, but I do believe that we
> > > > should incorporate this change somehow. I've seen various cases of
> > > > suboptimal address calculation codegen that boil down to this.
> > >
> > > This misses the ChangeLog (I assume it's unchanged) and indent
> > > of the match.pd part is now off.
> > >
> > > Please fix that, the patch is OK with that change.
> > >
> > > Thanks,
> > > Richard.
> > >
> > > > gcc/match.pd | 31 +++++++++++++++++++++++++++++++
> > > > gcc/testsuite/gcc.dg/pr109393.c | 16 ++++++++++++++++
> > > > 2 files changed, 47 insertions(+)
> > > > create mode 100644 gcc/testsuite/gcc.dg/pr109393.c
> > > >
> > > > diff --git a/gcc/match.pd b/gcc/match.pd
> > > > index 07e743ae464..1d642c205f0 100644
> > > > --- a/gcc/match.pd
> > > > +++ b/gcc/match.pd
> > > > @@ -3650,6 +3650,37 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
> > > > (plus (convert @0) (op @2 (convert @1))))))
> > > > #endif
> > > > +/* ((T)(A + CST1)) * CST2 + CST3
> > > > + -> ((T)(A) * CST2) + ((T)CST1 * CST2 + CST3)
> > > > + Where (A + CST1) doesn't need to have a single use. */
> > > > +#if GIMPLE
> > > > + (for op (plus minus)
> > > > + (simplify
> > > > + (plus (mult:s (convert:s (op @0 INTEGER_CST@1)) INTEGER_CST@2)
> > > > + INTEGER_CST@3)
> > > > + (if (INTEGRAL_TYPE_P (TREE_TYPE (@0))
> > > > + && INTEGRAL_TYPE_P (type)
> > > > + && TYPE_PRECISION (type) > TYPE_PRECISION (TREE_TYPE (@0))
> > > > + && TYPE_OVERFLOW_UNDEFINED (TREE_TYPE (@0))
> > > > + && !TYPE_OVERFLOW_SANITIZED (TREE_TYPE (@0))
> > > > + && TYPE_OVERFLOW_WRAPS (type))
> > > > + (op (mult (convert @0) @2) (plus (mult (convert @1) @2) @3)))))
> > > > +#endif
> > > > +
> > > > +/* ((T)(A + CST1)) * CST2 -> ((T)(A) * CST2) + ((T)CST1 * CST2) */
> > > > +#if GIMPLE
> > > > + (for op (plus minus)
> > > > + (simplify
> > > > + (mult (convert:s (op:s @0 INTEGER_CST@1)) INTEGER_CST@2)
> > > > + (if (INTEGRAL_TYPE_P (TREE_TYPE (@0))
> > > > + && INTEGRAL_TYPE_P (type)
> > > > + && TYPE_PRECISION (type) > TYPE_PRECISION (TREE_TYPE (@0))
> > > > + && TYPE_OVERFLOW_UNDEFINED (TREE_TYPE (@0))
> > > > + && !TYPE_OVERFLOW_SANITIZED (TREE_TYPE (@0))
> > > > + && TYPE_OVERFLOW_WRAPS (type))
> > > > + (op (mult (convert @0) @2) (mult (convert @1) @2)))))
> > > > +#endif
> > > > +
> > > > /* (T)(A) +- (T)(B) -> (T)(A +- B) only when (A +- B) could be 
> > > > simplified
> > > > to a simple value. */
> > > > (for op (plus minus)
> > > > diff --git a/gcc/testsuite/gcc.dg/pr109393.c 
> > > > b/gcc/testsuite/gcc.dg/pr109393.c
> > > > new file mode 100644
> > > > index 00000000000..e9051273672
> > > > --- /dev/null
> > > > +++ b/gcc/testsuite/gcc.dg/pr109393.c
> > > > @@ -0,0 +1,16 @@
> > > > +/* PR tree-optimization/109393 */
> > > > +/* { dg-do compile } */
> > > > +/* { dg-options "-O2 -fdump-tree-optimized" } */
> > > > +/* { dg-final { scan-tree-dump-times "return 1;" 2 "optimized" } } */
> > > > +
> > > > +int foo(int *a, int j)
> > > > +{
> > > > + int k = j - 1;
> > > > + return a[j - 1] == a[k];
> > > > +}
> > > > +
> > > > +int bar(int *a, int j)
> > > > +{
> > > > + int k = j - 1;
> > > > + return (&a[j + 1] - 2) == &a[k];
> > > > +}
> > > > --
> > > > 2.44.0
> > > >
> > > >
> > > > On Tue, Apr 23, 2024 at 1:33 PM Manolis Tsamis 
> > > > <manolis.tsa...@vrull.eu> wrote:
> > > > >
> > > > > The original motivation for this pattern was that the following 
> > > > > function does
> > > > > not fold to 'return 1':
> > > > >
> > > > > int foo(int *a, int j)
> > > > > {
> > > > >   int k = j - 1;
> > > > >   return a[j - 1] == a[k];
> > > > > }
> > > > >
> > > > > The expression ((unsigned long) (X +- C1) * C2) appears frequently as 
> > > > > part of
> > > > > address calculations (e.g. arrays). These patterns help fold and 
> > > > > simplify more
> > > > > expressions.
> > > > >
> > > > >         PR tree-optimization/109393
> > > > >
> > > > > gcc/ChangeLog:
> > > > >
> > > > >         * match.pd: Add new patterns for ((T)(A +- CST1)) * CST2 and
> > > > >           ((T)(A +- CST1)) * CST2 + CST3.
> > > > >
> > > > > gcc/testsuite/ChangeLog:
> > > > >
> > > > >         * gcc.dg/pr109393.c: New test.
> > > > >
> > > > > Signed-off-by: Manolis Tsamis <manolis.tsa...@vrull.eu>
> > > > > ---
> > > > >
> > > > >  gcc/match.pd                    | 30 ++++++++++++++++++++++++++++++
> > > > >  gcc/testsuite/gcc.dg/pr109393.c | 16 ++++++++++++++++
> > > > >  2 files changed, 46 insertions(+)
> > > > >  create mode 100644 gcc/testsuite/gcc.dg/pr109393.c
> > > > >
> > > > > diff --git a/gcc/match.pd b/gcc/match.pd
> > > > > index d401e7503e6..13c828ba70d 100644
> > > > > --- a/gcc/match.pd
> > > > > +++ b/gcc/match.pd
> > > > > @@ -3650,6 +3650,36 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
> > > > >         (plus (convert @0) (op @2 (convert @1))))))
> > > > >  #endif
> > > > >
> > > > > +/* ((T)(A + CST1)) * CST2 + CST3
> > > > > +     -> ((T)(A) * CST2) + ((T)CST1 * CST2 + CST3)
> > > > > +   Where (A + CST1) doesn't need to have a single use.  */
> > > > > +#if GIMPLE
> > > > > +  (for op (plus minus)
> > > > > +   (simplify
> > > > > +    (plus (mult (convert:s (op @0 INTEGER_CST@1)) INTEGER_CST@2) 
> > > > > INTEGER_CST@3)
> > > > > +     (if (TREE_CODE (TREE_TYPE (@0)) == INTEGER_TYPE
> > > > > +         && TREE_CODE (type) == INTEGER_TYPE
> > > > > +         && TYPE_PRECISION (type) > TYPE_PRECISION (TREE_TYPE (@0))
> > > > > +         && TYPE_OVERFLOW_UNDEFINED (TREE_TYPE (@0))
> > > > > +         && !TYPE_OVERFLOW_SANITIZED (TREE_TYPE (@0))
> > > > > +         && TYPE_OVERFLOW_WRAPS (type))
> > > > > +       (op (mult @2 (convert @0)) (plus (mult @2 (convert @1)) 
> > > > > @3)))))
> > > > > +#endif
> > > > > +
> > > > > +/* ((T)(A + CST1)) * CST2 -> ((T)(A) * CST2) + ((T)CST1 * CST2)  */
> > > > > +#if GIMPLE
> > > > > +  (for op (plus minus)
> > > > > +   (simplify
> > > > > +    (mult (convert:s (op:s @0 INTEGER_CST@1)) INTEGER_CST@2)
> > > > > +     (if (TREE_CODE (TREE_TYPE (@0)) == INTEGER_TYPE
> > > > > +         && TREE_CODE (type) == INTEGER_TYPE
> > > > > +         && TYPE_PRECISION (type) > TYPE_PRECISION (TREE_TYPE (@0))
> > > > > +         && TYPE_OVERFLOW_UNDEFINED (TREE_TYPE (@0))
> > > > > +         && !TYPE_OVERFLOW_SANITIZED (TREE_TYPE (@0))
> > > > > +         && TYPE_OVERFLOW_WRAPS (type))
> > > > > +       (op (mult @2 (convert @0)) (mult @2 (convert @1))))))
> > > > > +#endif
> > > > > +
> > > > >  /* (T)(A) +- (T)(B) -> (T)(A +- B) only when (A +- B) could be 
> > > > > simplified
> > > > >     to a simple value.  */
> > > > >    (for op (plus minus)
> > > > > diff --git a/gcc/testsuite/gcc.dg/pr109393.c 
> > > > > b/gcc/testsuite/gcc.dg/pr109393.c
> > > > > new file mode 100644
> > > > > index 00000000000..e9051273672
> > > > > --- /dev/null
> > > > > +++ b/gcc/testsuite/gcc.dg/pr109393.c
> > > > > @@ -0,0 +1,16 @@
> > > > > +/* PR tree-optimization/109393 */
> > > > > +/* { dg-do compile } */
> > > > > +/* { dg-options "-O2 -fdump-tree-optimized" } */
> > > > > +/* { dg-final { scan-tree-dump-times "return 1;" 2 "optimized" } } */
> > > > > +
> > > > > +int foo(int *a, int j)
> > > > > +{
> > > > > +  int k = j - 1;
> > > > > +  return a[j - 1] == a[k];
> > > > > +}
> > > > > +
> > > > > +int bar(int *a, int j)
> > > > > +{
> > > > > +  int k = j - 1;
> > > > > +  return (&a[j + 1] - 2) == &a[k];
> > > > > +}
> > > > > --
> > > > > 2.34.1
> > > > >
> >
>
> --
> Richard Biener <rguent...@suse.de>
> SUSE Software Solutions Germany GmbH,
> Frankenstrasse 146, 90461 Nuernberg, Germany;
> GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)

Reply via email to