On Fri, May 17, 2024 at 12:22 PM Richard Biener <rguent...@suse.de> wrote: > > On Fri, 17 May 2024, Manolis Tsamis wrote: > > > Hi Richard, > > > > While I was re-testing the latest version of this patch I noticed that > > it FAILs an AArch64 test, gcc.target/aarch64/subsp.c. With the patch > > we generate one instruction more: > > > > sbfiz x1, x1, 4, 32 > > stp x29, x30, [sp, -16]! > > add x1, x1, 16 > > mov x29, sp > > sub sp, sp, x1 > > mov x0, sp > > bl foo > > > > Instead of: > > > > stp x29, x30, [sp, -16]! > > add w1, w1, 1 > > mov x29, sp > > sub sp, sp, w1, sxtw 4 > > mov x0, sp > > bl foo > > > > I've looked at it but can't really find a way to solve the regression. > > Any thoughts on this? > > Can you explain what goes wrong? As I said rewriting parts of > address calculation is tricky, there's always the chance that some > cases regress (see your observation in comment#4 of the PR). >
In this case the int -> sizetype cast ends up happening earlier. Instead of _7 = y_6(D) + 1; _1 = (sizetype) _7; _2 = _1 * 16; We get _13 = (sizetype) y_6(D); _15 = _13 + 1; _2 = _15 * 16; and then in RTL we have x1 = ((sizetype) x1) << 4 sp = sp - (x1 + 16) instead of x1 = x1 + 1 sp = sp - ((sizetype) x1) << 4 which doesn't form sub sp, sp, w1, sxtw 4. But more importantly, I realized that (in this case among others) the pattern is undone by (A * C) +- (B * C) -> (A+-B) * C and (A * C) +- A -> A * (C+-1). AFAIK having one pattern and its reverse is a bad thing so something needs to be changed. One idea could be to only keep the larger one ((T)(A + CST1)) * CST2 + CST3 -> ((T)(A) * CST2) + ((T)CST1 * CST2 + CST3). it's not enough to deal with the testcases of the ticket but it does help in other cases. Manolis > Note that I still believe that avoiding the early and premature > promotion of the addition to unsigned is a good thing. > > Note the testcase in the PR is fixed with -fwrapv because then > we do _not_ perform this premature optimization. Without -fwrapv > the optimization is valid but as you note we do not perform it > consistently - otherwise we wouldn't regress. > > Richard. > > > > > Thanks, > > Manolis > > > > > > > > On Thu, May 16, 2024 at 11:15 AM Richard Biener > > <richard.guent...@gmail.com> wrote: > > > > > > On Tue, May 14, 2024 at 10:58 AM Manolis Tsamis <manolis.tsa...@vrull.eu> > > > wrote: > > > > > > > > New patch with the requested changes can be found below. > > > > > > > > I don't know how much this affects SCEV, but I do believe that we > > > > should incorporate this change somehow. I've seen various cases of > > > > suboptimal address calculation codegen that boil down to this. > > > > > > This misses the ChangeLog (I assume it's unchanged) and indent > > > of the match.pd part is now off. > > > > > > Please fix that, the patch is OK with that change. > > > > > > Thanks, > > > Richard. > > > > > > > gcc/match.pd | 31 +++++++++++++++++++++++++++++++ > > > > gcc/testsuite/gcc.dg/pr109393.c | 16 ++++++++++++++++ > > > > 2 files changed, 47 insertions(+) > > > > create mode 100644 gcc/testsuite/gcc.dg/pr109393.c > > > > > > > > diff --git a/gcc/match.pd b/gcc/match.pd > > > > index 07e743ae464..1d642c205f0 100644 > > > > --- a/gcc/match.pd > > > > +++ b/gcc/match.pd > > > > @@ -3650,6 +3650,37 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT) > > > > (plus (convert @0) (op @2 (convert @1)))))) > > > > #endif > > > > +/* ((T)(A + CST1)) * CST2 + CST3 > > > > + -> ((T)(A) * CST2) + ((T)CST1 * CST2 + CST3) > > > > + Where (A + CST1) doesn't need to have a single use. */ > > > > +#if GIMPLE > > > > + (for op (plus minus) > > > > + (simplify > > > > + (plus (mult:s (convert:s (op @0 INTEGER_CST@1)) INTEGER_CST@2) > > > > + INTEGER_CST@3) > > > > + (if (INTEGRAL_TYPE_P (TREE_TYPE (@0)) > > > > + && INTEGRAL_TYPE_P (type) > > > > + && TYPE_PRECISION (type) > TYPE_PRECISION (TREE_TYPE (@0)) > > > > + && TYPE_OVERFLOW_UNDEFINED (TREE_TYPE (@0)) > > > > + && !TYPE_OVERFLOW_SANITIZED (TREE_TYPE (@0)) > > > > + && TYPE_OVERFLOW_WRAPS (type)) > > > > + (op (mult (convert @0) @2) (plus (mult (convert @1) @2) @3))))) > > > > +#endif > > > > + > > > > +/* ((T)(A + CST1)) * CST2 -> ((T)(A) * CST2) + ((T)CST1 * CST2) */ > > > > +#if GIMPLE > > > > + (for op (plus minus) > > > > + (simplify > > > > + (mult (convert:s (op:s @0 INTEGER_CST@1)) INTEGER_CST@2) > > > > + (if (INTEGRAL_TYPE_P (TREE_TYPE (@0)) > > > > + && INTEGRAL_TYPE_P (type) > > > > + && TYPE_PRECISION (type) > TYPE_PRECISION (TREE_TYPE (@0)) > > > > + && TYPE_OVERFLOW_UNDEFINED (TREE_TYPE (@0)) > > > > + && !TYPE_OVERFLOW_SANITIZED (TREE_TYPE (@0)) > > > > + && TYPE_OVERFLOW_WRAPS (type)) > > > > + (op (mult (convert @0) @2) (mult (convert @1) @2))))) > > > > +#endif > > > > + > > > > /* (T)(A) +- (T)(B) -> (T)(A +- B) only when (A +- B) could be > > > > simplified > > > > to a simple value. */ > > > > (for op (plus minus) > > > > diff --git a/gcc/testsuite/gcc.dg/pr109393.c > > > > b/gcc/testsuite/gcc.dg/pr109393.c > > > > new file mode 100644 > > > > index 00000000000..e9051273672 > > > > --- /dev/null > > > > +++ b/gcc/testsuite/gcc.dg/pr109393.c > > > > @@ -0,0 +1,16 @@ > > > > +/* PR tree-optimization/109393 */ > > > > +/* { dg-do compile } */ > > > > +/* { dg-options "-O2 -fdump-tree-optimized" } */ > > > > +/* { dg-final { scan-tree-dump-times "return 1;" 2 "optimized" } } */ > > > > + > > > > +int foo(int *a, int j) > > > > +{ > > > > + int k = j - 1; > > > > + return a[j - 1] == a[k]; > > > > +} > > > > + > > > > +int bar(int *a, int j) > > > > +{ > > > > + int k = j - 1; > > > > + return (&a[j + 1] - 2) == &a[k]; > > > > +} > > > > -- > > > > 2.44.0 > > > > > > > > > > > > On Tue, Apr 23, 2024 at 1:33 PM Manolis Tsamis > > > > <manolis.tsa...@vrull.eu> wrote: > > > > > > > > > > The original motivation for this pattern was that the following > > > > > function does > > > > > not fold to 'return 1': > > > > > > > > > > int foo(int *a, int j) > > > > > { > > > > > int k = j - 1; > > > > > return a[j - 1] == a[k]; > > > > > } > > > > > > > > > > The expression ((unsigned long) (X +- C1) * C2) appears frequently as > > > > > part of > > > > > address calculations (e.g. arrays). These patterns help fold and > > > > > simplify more > > > > > expressions. > > > > > > > > > > PR tree-optimization/109393 > > > > > > > > > > gcc/ChangeLog: > > > > > > > > > > * match.pd: Add new patterns for ((T)(A +- CST1)) * CST2 and > > > > > ((T)(A +- CST1)) * CST2 + CST3. > > > > > > > > > > gcc/testsuite/ChangeLog: > > > > > > > > > > * gcc.dg/pr109393.c: New test. > > > > > > > > > > Signed-off-by: Manolis Tsamis <manolis.tsa...@vrull.eu> > > > > > --- > > > > > > > > > > gcc/match.pd | 30 ++++++++++++++++++++++++++++++ > > > > > gcc/testsuite/gcc.dg/pr109393.c | 16 ++++++++++++++++ > > > > > 2 files changed, 46 insertions(+) > > > > > create mode 100644 gcc/testsuite/gcc.dg/pr109393.c > > > > > > > > > > diff --git a/gcc/match.pd b/gcc/match.pd > > > > > index d401e7503e6..13c828ba70d 100644 > > > > > --- a/gcc/match.pd > > > > > +++ b/gcc/match.pd > > > > > @@ -3650,6 +3650,36 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT) > > > > > (plus (convert @0) (op @2 (convert @1)))))) > > > > > #endif > > > > > > > > > > +/* ((T)(A + CST1)) * CST2 + CST3 > > > > > + -> ((T)(A) * CST2) + ((T)CST1 * CST2 + CST3) > > > > > + Where (A + CST1) doesn't need to have a single use. */ > > > > > +#if GIMPLE > > > > > + (for op (plus minus) > > > > > + (simplify > > > > > + (plus (mult (convert:s (op @0 INTEGER_CST@1)) INTEGER_CST@2) > > > > > INTEGER_CST@3) > > > > > + (if (TREE_CODE (TREE_TYPE (@0)) == INTEGER_TYPE > > > > > + && TREE_CODE (type) == INTEGER_TYPE > > > > > + && TYPE_PRECISION (type) > TYPE_PRECISION (TREE_TYPE (@0)) > > > > > + && TYPE_OVERFLOW_UNDEFINED (TREE_TYPE (@0)) > > > > > + && !TYPE_OVERFLOW_SANITIZED (TREE_TYPE (@0)) > > > > > + && TYPE_OVERFLOW_WRAPS (type)) > > > > > + (op (mult @2 (convert @0)) (plus (mult @2 (convert @1)) > > > > > @3))))) > > > > > +#endif > > > > > + > > > > > +/* ((T)(A + CST1)) * CST2 -> ((T)(A) * CST2) + ((T)CST1 * CST2) */ > > > > > +#if GIMPLE > > > > > + (for op (plus minus) > > > > > + (simplify > > > > > + (mult (convert:s (op:s @0 INTEGER_CST@1)) INTEGER_CST@2) > > > > > + (if (TREE_CODE (TREE_TYPE (@0)) == INTEGER_TYPE > > > > > + && TREE_CODE (type) == INTEGER_TYPE > > > > > + && TYPE_PRECISION (type) > TYPE_PRECISION (TREE_TYPE (@0)) > > > > > + && TYPE_OVERFLOW_UNDEFINED (TREE_TYPE (@0)) > > > > > + && !TYPE_OVERFLOW_SANITIZED (TREE_TYPE (@0)) > > > > > + && TYPE_OVERFLOW_WRAPS (type)) > > > > > + (op (mult @2 (convert @0)) (mult @2 (convert @1)))))) > > > > > +#endif > > > > > + > > > > > /* (T)(A) +- (T)(B) -> (T)(A +- B) only when (A +- B) could be > > > > > simplified > > > > > to a simple value. */ > > > > > (for op (plus minus) > > > > > diff --git a/gcc/testsuite/gcc.dg/pr109393.c > > > > > b/gcc/testsuite/gcc.dg/pr109393.c > > > > > new file mode 100644 > > > > > index 00000000000..e9051273672 > > > > > --- /dev/null > > > > > +++ b/gcc/testsuite/gcc.dg/pr109393.c > > > > > @@ -0,0 +1,16 @@ > > > > > +/* PR tree-optimization/109393 */ > > > > > +/* { dg-do compile } */ > > > > > +/* { dg-options "-O2 -fdump-tree-optimized" } */ > > > > > +/* { dg-final { scan-tree-dump-times "return 1;" 2 "optimized" } } */ > > > > > + > > > > > +int foo(int *a, int j) > > > > > +{ > > > > > + int k = j - 1; > > > > > + return a[j - 1] == a[k]; > > > > > +} > > > > > + > > > > > +int bar(int *a, int j) > > > > > +{ > > > > > + int k = j - 1; > > > > > + return (&a[j + 1] - 2) == &a[k]; > > > > > +} > > > > > -- > > > > > 2.34.1 > > > > > > > > > -- > Richard Biener <rguent...@suse.de> > SUSE Software Solutions Germany GmbH, > Frankenstrasse 146, 90461 Nuernberg, Germany; > GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)