On Wed, Oct 16, 2013 at 7:59 AM, Jeff Law <l...@redhat.com> wrote: > On 07/08/13 11:45, Kai Tietz wrote: >> >> Hello, >> >> These passes are implementing type-demotion (type sinking into >> statements) for some gimple statements. I limitted inital >> implementation to the set of multiply, addition, substraction, and >> binary-and/or/xor. Additional this pass adds some rules to sink >> type-cast sequences - eg. (int) (short) x; with char as type of x. >> This special handing in this pass of such type-sequence >> simplification is necessary to avoid too complex cast-sequences by >> type-unsigned conversion used by this pass to avoid undefined >> overflow behaviour. >> >> I will sent separate patch with some test-cases to demonstrate and >> verify operation of this new optimization. Just one sample I will >> cite here to demonstrate operation of type-demotion pass. > > So I think we want to come back to this patch and make some decisions about > how to go forward. > > I've been reviewing all the discussions back through last year and I think > the single biggest thing we need to realize is that there's nothing in this > patch that *couldn't* be handled by tree-ssa-forwprop with a suitable amount > of following the use-def chains through casts in various transformations and > determining when those casts can be safely ignored. However, I don't think > extending tree-ssa-forwprop in this way is wise. > > As I see it, the value in this patch is it allows us to avoid that nonsense > by essentially reformulating this a problem of moving and reformulating the > casts in the IL. > > I see two primary effects of type sinking.
Note it was called type demotion ;) > First and probably the most > important in my mind is by sinking a cast through its uses the various > transformations we already perform are more likely to apply *without* > needing to handle optimizing through typecasts explicitly. I would say it is desirable to express arithmetic in the smallest possible types (see what premature optimization the C family frontends do to narrow operations again after C integer promotion applied). You need some kind of range information to do this, thus either integrate it into VRP (there is already code that does this there) or use range information from VRP which we now preserve. > The second primary effect is, given two casts where the first indirectly > feeds the second (ie, the first feeds some statement, which then feeds the > second cast), if we're able to sink the first cast, we end up with the first > cast directly feeding the second cast. When this occurs one of the two > casts can often be eliminated. Sadly, I didn't keep any of those test > files, but I regularly saw them in GCC bootstraps. This transformation is applied both by fold-const.c and by SSA forwprop (our GIMPLE combiner). Doing it in yet another pass looks wrong (and it isn't type demotion but also can be promotion). > Kai, you mentioned you had more tests, but I never saw those posted. I > pulled together a handful of tests from the PRs & discussion threads. Some > may be duplicates of yours (mine are attached). If you could post yours as > well, it'd be helpful. Not all of mine are helped by this patch, but at > least two are. > > There was some question about what to do with PROMOTE_MODE based targets. > On those targets there's a lot of value in getting something into word mode > and doing everything in word mode. I think that can/should be a separate > issue -- ISTM that ought to be handled fairly late in the tree optimizers. > I don't see a strong argument for gating this patch on catering to > PROMOTE_MODE. In contrast to the desire of expressing operations in the smallest required type there is the desire of exposing the effect of PROMOTE_MODE on GIMPLE instead of only during RTL expansion. This is because the truncations (sext and zext) PROMOTE_MODE introduced are easier to optimize away when range information is available (see the attempts to address this at RTL expansion time from Kugan from Linaro). > > Similarly, I know there's a type hoisting patch that's also queued up. I > think it should be handled separately as well. I think we need to paint a picture of the final result - what is the main objective of the various(?!) passes in question? Where do we do the same kind of transformation already? AFAIK we have many places that optimize a series of conversions (that's a combine-like problem), I'd call that conversion contraction. We do have a set of foldings in the C family frontends, in fold and in forwprop that try to un-do the effect of C integer promotions (that's also a combine-like problem). We have no pass that tries to promote or demote the types of variables with using a data-flow approach (VRP comes closest, but the transform is again pattern-matching, thus combine-like). I do not object to adding this kind of pass, but I suggest to look at the targets desires when implementing it - which eventually means to honor PROMOTE_MODE (be careful about pass placement here - you want this after loop optimizations like vectorization but possibly before induction variable optimization). Richard. > > >> ------end------ >> >> You will notice that by typedemote2 pass the 's += t + 0x12345600;' >> expression gets simplified to 's += t + 0;'. >> >> I have added an additional early pass "typedemote1" to this patch for >> simple cases types can be easily sunken into statement without >> special unsigned-cast for overflow-case. Jakub asked for it. Tests >> have shown that this pass does optimizations in pretty few cases. As >> example in testsuite see for example pr46867.c testcase. The second >> pass (I put it behind first vrp pass to avoid testcase-conflicts) >> uses 'unsigned'-type casts to avoid undefined overflow behavior. This >> pass has much more hits in standard code, but seems to trigger some >> regressions in vect pass: >> >> List of regi >> >> FAIL: gcc.dg/vect/slp-cond-3.c scan-tree-dump-times vect "vectorizing >> stmts using SLP" 1 FAIL: gcc.dg/vect/vect-reduc-dot-s8b.c -flto >> scan-tree-dump-times vect "vect_recog_widen_mult_pattern: detected" >> 1 FAIL: gcc.dg/vect/slp-cond-3.c -flto scan-tree-dump-times vect >> "vectorizing stmts using SLP" 1 FAIL: gcc.target/i386/rotate-1.c >> scan-assembler ro[lr]b > > Have you analyzed these failures? Are they a result of the casts ending up > in inconvenient locations or is there something more subtle going on here? > We certainly need to understand precisely what's going on with these > regressions before we can move forward. > > As for the patch itself. Obviously it needs updates as a result of David's > work on the pass manager class, and other changes on the trunk since July. > > > tree-ssa-te.c needs a file block comment which describes what the patch is > trying to accomplish. > > Are all the includes necessary? cfgloop.h hash-table.h, others? We're > trying to clean things up, new files ought to be relatively clean as they go > in rather than making things worse :-) > > > build_and_add_sum is poorly named. It handles far more operations than its > name suggests. Similarly its embedded comments still refer to "addition > statement" and clearly need updating. It may need updating with the > streamlined code to build up statements as well. > > Hmm, I don't think I'm going to call out all the updates to deal with codes > in the trunk codebase -- I'm going to assume you'll fix those as part of > trying to get this running on the trunk again. > > > The code to find an insertion point seems to duplicate two large blobs of > code, one when we insert relative to op2, the other when we insert relative > to op1. ISTM that code should be refactored to avoid the duplication. Once > that refactoring is done, build_and_add_sum is going to be pretty damn > simple and easy to understand. > > Is there a good reason why we have a recursive dead code eliminator in this > pass? ie, is there a good reason why we don't just leave that stuff in the > IL and let our standard DCE take care of it? (remove_stmt_chain and its > caller is what I'm referring to). Duplicating DCE, even a trivial one like > this is prone to long term maintenance problems. Do we gain something by > cleaning up after ourselves? Are the conditions under which we optimize so > strict that we don't have to perform all the sanity checks before removing a > statement that we find in tree-ssa-dce.c? > > > gen_cast seems to return either a INTEGER_CST or an expression -- it doesn't > return an assignment. It seems like the comment at the start of that > function needs to be updated. Shouldn't the dumping be conditional on > TDF_DETAILS? > > > I'm a bit surprised we don't have an equivalent of demote_cast_p somewhere. > If not, it feels like that is generic enough that we'd like it elsewhere so > that it can be re-used. > > In demote_into, use (T) in the comment to refer to the type rather than > (typ). The former is used elsewhere in this file and all over fold-const.c > and other places. It's a minor nit, but the consistency helps folks reading > the code in the future. > > Please use consistent grouping in the switch statement in demote_into: > > > + case MULT_EXPR: > + case PLUS_EXPR: case MINUS_EXPR: > > I think GNU standards recommend each on its own line. If you're going to > group them, then ISTM all three belong on the same line. Whatever style you > use, use it consistently in that function please. > > Presumably you don't group these with their related logicals because > arithmetics have to deal with overflow? > > > I'm going to assume the conditions under which you apply the transformations > and any adjustments you make are correct. It might help to add some > comments to the handling of the MULT, PLUS & MINUS. You've got 3 strategies > in there, but no comments as to why you use a particular one at any given > time. Similarly for MIN_EXPR, MAX_EXPR > > > > + /* (NEWTYPE) (X >> Y) can be transformed to > + ((NEWTYPE) X) >> Y, if NEWTYPE == X and > + SIGN(NEWTYPE) == SIGN(X). */ > > The comments here are a bit confusing -- you're talking about equality of a > type and an object. I'm guessing you really mean NEWTYPE == TYPEOF (X)? I > think this occurred elsewhere as well, the mixing of objects and types is a > bit too free IMHO. Better to be explicit than assume the reader will make > the leap that you're talking about TYPEOF (X). > > Again, it seems like dumping should be dependent on TDF_DETAILS. > > > > In execute_type_demote_int, you've got two big conditionals. The first > checks the lhs, the second the rhs. I'd separate them with one line of > vertical whitespace and add a comment before each conditional explaining in > english what each is doing. I can figure it out from reading the code, but > it'd be easier to just read a well written comment. > > There's a comment in execute_type_demote_int > > /* OCCURS_IN_ABNORMAL_PHI */ > > What does that comment mean? It doesn't seem to directly relate to any of > the code in that file. > > I'm going to assume your iterator is correct in the cases where you're able > to optimize and that you don't end up skipping statements unexpectedly. > > > You compute CDI_DOMINATORS, but I don't see you use dominators -- you > certainly do use post-domination. > > > > > Seems like a lot of stuff but I think this is pretty close. The biggest > issue in my mind is we need a clear understanding of the vectorizer > regressions. > > We can kick this around more when we talk later today. > > Cheers, > jeff > > > > > diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr45397-1.c > b/gcc/testsuite/gcc.dg/tree-ssa/pr45397-1.c > new file mode 100644 > index 0000000..ba395e4 > --- /dev/null > +++ b/gcc/testsuite/gcc.dg/tree-ssa/pr45397-1.c > @@ -0,0 +1,16 @@ > +/* { dg-do compile } */ > +/* { dg-options "-O2 -fdump-tree-optimized" } */ > + > +signed char a[1024], b[1024]; > + > +void > +baz (void) > +{ > + int i, s, t; > + for (i = 0; i < 1024; i++) > + { s = a[i]; t = b[i]; s += t + 0x12345600; a[i] = s; } > +} > + > +/* { dg-final { scan-tree-dump-times "305419776" 0 "optimized" } } */ > +/* { dg-final { cleanup-tree-dump "optimized" } } */ > + > diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr45397-2.c > b/gcc/testsuite/gcc.dg/tree-ssa/pr45397-2.c > new file mode 100644 > index 0000000..8185c69 > --- /dev/null > +++ b/gcc/testsuite/gcc.dg/tree-ssa/pr45397-2.c > @@ -0,0 +1,19 @@ > +/* { dg-do compile } */ > +/* { dg-options "-O2 -fdump-tree-optimized" } */ > + > +int foo (const unsigned char *tmp, int i, int val) > +{ > + return (unsigned char)(((tmp[i] + val)>0xFF)?0xFF:(((tmp[i] + > val)<0)?0:(tmp[i] + val))); > +} > + > +int bar (const unsigned char *tmp, int i, int val) > +{ > + int x = (((tmp[i] + val)>0xFF)?0xFF:(((tmp[i] + val)<0)?0:(tmp[i] + > val))); > + return (unsigned char)x; > +} > + > + > +/* { dg-final { scan-tree-dump-times "MIN_EXPR" 2 "optimized" { xfail *-*-* > } } } */ > +/* { dg-final { scan-tree-dump-times "MAX_EXPR" 2 "optimized" { xfail *-*-* > } } } */ > +/* { dg-final { cleanup-tree-dump "optimized" } } */ > + > diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr45397-3.c > b/gcc/testsuite/gcc.dg/tree-ssa/pr45397-3.c > new file mode 100644 > index 0000000..15a493e > --- /dev/null > +++ b/gcc/testsuite/gcc.dg/tree-ssa/pr45397-3.c > @@ -0,0 +1,16 @@ > +/* { dg-do compile } */ > +/* { dg-options "-O2 -fdump-tree-optimized" } */ > + > + > +int foo (const unsigned char *a, int b, int c) > +{ > + int x = (unsigned char) (a[b] + c); > + int y = a[b] + c; > + int z = (unsigned char) y; > + return x == z; > +} > + > + > +/* { dg-final { scan-tree-dump-times "return 1;" 1 "optimized" { xfail > *-*-* } } } */ > +/* { dg-final { cleanup-tree-dump "optimized" } } */ > + > diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr45397-5.c > b/gcc/testsuite/gcc.dg/tree-ssa/pr45397-5.c > new file mode 100644 > index 0000000..f185526 > --- /dev/null > +++ b/gcc/testsuite/gcc.dg/tree-ssa/pr45397-5.c > @@ -0,0 +1,34 @@ > +/* { dg-do compile } */ > +/* { dg-options "-O2 -fdump-tree-optimized" } */ > + > + > + > +int f1 (int a, int b, int c) > +{ > + if ((a + b) == (c + a)) > + return 1; > + return 0; > +} > + > +int f2 (int a, int b, int c) > +{ > + if ((a ^ b) == (a ^ c)) > + return 1; > + return 0; > +} > + > + > +int f3 (int a, int b) > +{ > + if (-a == (b - a)) > + return 1; > + return 0; > +} > + > + > + > +/* { dg-final { scan-tree-dump-not "\\+" "optimized" { xfail *-*-* } } } */ > +/* { dg-final { scan-tree-dump-not "\\^" "optimized" } } */ > +/* { dg-final { scan-tree-dump-not "\\-" "optimized" { xfail *-*-* } } } */ > +/* { dg-final { cleanup-tree-dump "optimized" } } */ > + > diff --git a/gcc/testsuite/gcc.target/i386/pr45397-4.c > b/gcc/testsuite/gcc.target/i386/pr45397-4.c > new file mode 100644 > index 0000000..9bf66b7 > --- /dev/null > +++ b/gcc/testsuite/gcc.target/i386/pr45397-4.c > @@ -0,0 +1,28 @@ > +/* { dg-do compile } */ > +/* { dg-options "-O3 -fdump-tree-optimized -mavx" } */ > + > + > +signed char a[1024], b[1024]; > + > +void > +foo (void) > +{ > + int i, s, t; > + for (i = 0; i < 1024; i++) > + { s = a[i]; t = b[i]; s += t; a[i] = s; } > +} > + > +void > +bar (void) > +{ > + int i; > + for (i = 0; i < 1024; i++) > + a[i] += b[i]; > +} > + > + > + > +/* { dg-final { scan-tree-dump-times "VIEW_CONVERT_EXPR" 6 "optimized" } } > */ > +/* { dg-final { scan-tree-dump-not "VEC_PACK_TRUNC_EXPR" "optimized" } } */ > +/* { dg-final { cleanup-tree-dump "optimized" } } */ > + > diff --git a/gcc/testsuite/gcc.target/i386/pr47477-1.c > b/gcc/testsuite/gcc.target/i386/pr47477-1.c > new file mode 100644 > index 0000000..a70ce87 > --- /dev/null > +++ b/gcc/testsuite/gcc.target/i386/pr47477-1.c > @@ -0,0 +1,29 @@ > +/* { dg-do compile } */ > +/* { dg-options "-O3 -fdump-tree-optimized -m32" } */ > + > + > +void * > +add (void *a, void *b) > +{ > + return (void *)(__INTPTR_TYPE__) ((long long)(__INTPTR_TYPE__) a + ((long > long)(__INTPTR_TYPE__) b & ~1L)); > +} > + > +void * > +bar (void *a, void *b) > +{ > + long long tmp = (long long)(__INTPTR_TYPE__) a + ((long > long)(__INTPTR_TYPE__) b & ~1L); > + return (void *)(__INTPTR_TYPE__) tmp; > +} > + > + > + > + > +/* { dg-final { scan-tree-dump-times "\\(unsigned int\\)" 4 "optimized" } } > */ > +/* { dg-final { scan-tree-dump-times "& 4294967294" 2 "optimized" } } */ > +/* { dg-final { scan-tree-dump-times "\\+" 2 "optimized" } } */ > +/* { dg-final { scan-tree-dump-times "\\(void \\*\\)" 2 "optimized" } } */ > +/* { dg-final { scan-tree-dump-not "\\(long long int\\)" "optimized" } } */ > +/* { dg-final { scan-tree-dump-not "\\(int\\)" "optimized" } } */ > +/* { dg-final { cleanup-tree-dump "optimized" } } */ > + > + >