On 07/08/13 11:45, Kai Tietz wrote:
Hello,

These passes are implementing type-demotion (type sinking into
statements) for some gimple statements.  I limitted inital
implementation to the set of multiply, addition, substraction, and
binary-and/or/xor.  Additional this pass adds some rules to sink
type-cast sequences - eg. (int) (short) x; with char as type of x.
This special handing in this pass of such type-sequence
simplification is necessary to avoid too complex cast-sequences by
type-unsigned conversion used by this pass to avoid undefined
overflow behaviour.

I will sent separate patch with some test-cases to demonstrate and
verify operation of this new optimization.  Just one sample I will
cite here to demonstrate operation of type-demotion pass.
So I think we want to come back to this patch and make some decisions about how to go forward.

I've been reviewing all the discussions back through last year and I think the single biggest thing we need to realize is that there's nothing in this patch that *couldn't* be handled by tree-ssa-forwprop with a suitable amount of following the use-def chains through casts in various transformations and determining when those casts can be safely ignored. However, I don't think extending tree-ssa-forwprop in this way is wise.

As I see it, the value in this patch is it allows us to avoid that nonsense by essentially reformulating this a problem of moving and reformulating the casts in the IL.

I see two primary effects of type sinking. First and probably the most important in my mind is by sinking a cast through its uses the various transformations we already perform are more likely to apply *without* needing to handle optimizing through typecasts explicitly.

The second primary effect is, given two casts where the first indirectly feeds the second (ie, the first feeds some statement, which then feeds the second cast), if we're able to sink the first cast, we end up with the first cast directly feeding the second cast. When this occurs one of the two casts can often be eliminated. Sadly, I didn't keep any of those test files, but I regularly saw them in GCC bootstraps.

Kai, you mentioned you had more tests, but I never saw those posted. I pulled together a handful of tests from the PRs & discussion threads. Some may be duplicates of yours (mine are attached). If you could post yours as well, it'd be helpful. Not all of mine are helped by this patch, but at least two are.

There was some question about what to do with PROMOTE_MODE based targets. On those targets there's a lot of value in getting something into word mode and doing everything in word mode. I think that can/should be a separate issue -- ISTM that ought to be handled fairly late in the tree optimizers. I don't see a strong argument for gating this patch on catering to PROMOTE_MODE.

Similarly, I know there's a type hoisting patch that's also queued up. I think it should be handled separately as well.


------end------

You will notice that by typedemote2 pass the 's += t + 0x12345600;'
expression gets simplified to 's += t + 0;'.

I have added an additional early pass "typedemote1" to this patch for
simple cases types can be easily sunken into statement without
special unsigned-cast for overflow-case.   Jakub asked for it. Tests
have shown that this pass does optimizations in pretty few cases.  As
example in testsuite see for example pr46867.c testcase. The second
pass (I put it behind first vrp pass to avoid testcase-conflicts)
uses 'unsigned'-type casts to avoid undefined overflow behavior. This
pass has much more hits in standard code, but seems to trigger some
regressions in vect pass:

List of regi

FAIL: gcc.dg/vect/slp-cond-3.c scan-tree-dump-times vect "vectorizing
stmts using SLP" 1 FAIL: gcc.dg/vect/vect-reduc-dot-s8b.c -flto
scan-tree-dump-times vect "vect_recog_widen_mult_pattern: detected"
1 FAIL: gcc.dg/vect/slp-cond-3.c -flto  scan-tree-dump-times vect
"vectorizing stmts using SLP" 1 FAIL: gcc.target/i386/rotate-1.c
scan-assembler ro[lr]b
Have you analyzed these failures? Are they a result of the casts ending up in inconvenient locations or is there something more subtle going on here? We certainly need to understand precisely what's going on with these regressions before we can move forward.

As for the patch itself. Obviously it needs updates as a result of David's work on the pass manager class, and other changes on the trunk since July.


tree-ssa-te.c needs a file block comment which describes what the patch is trying to accomplish.

Are all the includes necessary? cfgloop.h hash-table.h, others? We're trying to clean things up, new files ought to be relatively clean as they go in rather than making things worse :-)


build_and_add_sum is poorly named. It handles far more operations than its name suggests. Similarly its embedded comments still refer to "addition statement" and clearly need updating. It may need updating with the streamlined code to build up statements as well.

Hmm, I don't think I'm going to call out all the updates to deal with codes in the trunk codebase -- I'm going to assume you'll fix those as part of trying to get this running on the trunk again.


The code to find an insertion point seems to duplicate two large blobs of code, one when we insert relative to op2, the other when we insert relative to op1. ISTM that code should be refactored to avoid the duplication. Once that refactoring is done, build_and_add_sum is going to be pretty damn simple and easy to understand.

Is there a good reason why we have a recursive dead code eliminator in this pass? ie, is there a good reason why we don't just leave that stuff in the IL and let our standard DCE take care of it? (remove_stmt_chain and its caller is what I'm referring to). Duplicating DCE, even a trivial one like this is prone to long term maintenance problems. Do we gain something by cleaning up after ourselves? Are the conditions under which we optimize so strict that we don't have to perform all the sanity checks before removing a statement that we find in tree-ssa-dce.c?


gen_cast seems to return either a INTEGER_CST or an expression -- it doesn't return an assignment. It seems like the comment at the start of that function needs to be updated. Shouldn't the dumping be conditional on TDF_DETAILS?


I'm a bit surprised we don't have an equivalent of demote_cast_p somewhere. If not, it feels like that is generic enough that we'd like it elsewhere so that it can be re-used.

In demote_into, use (T) in the comment to refer to the type rather than (typ). The former is used elsewhere in this file and all over fold-const.c and other places. It's a minor nit, but the consistency helps folks reading the code in the future.

Please use consistent grouping in the switch statement in demote_into:

+    case MULT_EXPR:
+    case PLUS_EXPR: case MINUS_EXPR:

I think GNU standards recommend each on its own line. If you're going to group them, then ISTM all three belong on the same line. Whatever style you use, use it consistently in that function please.

Presumably you don't group these with their related logicals because arithmetics have to deal with overflow?


I'm going to assume the conditions under which you apply the transformations and any adjustments you make are correct. It might help to add some comments to the handling of the MULT, PLUS & MINUS. You've got 3 strategies in there, but no comments as to why you use a particular one at any given time. Similarly for MIN_EXPR, MAX_EXPR


+      /* (NEWTYPE) (X >> Y) can be transformed to
+         ((NEWTYPE) X) >> Y, if NEWTYPE == X and
+         SIGN(NEWTYPE) == SIGN(X).  */

The comments here are a bit confusing -- you're talking about equality of a type and an object. I'm guessing you really mean NEWTYPE == TYPEOF (X)? I think this occurred elsewhere as well, the mixing of objects and types is a bit too free IMHO. Better to be explicit than assume the reader will make the leap that you're talking about TYPEOF (X).

Again, it seems like dumping should be dependent on TDF_DETAILS.



In execute_type_demote_int, you've got two big conditionals. The first checks the lhs, the second the rhs. I'd separate them with one line of vertical whitespace and add a comment before each conditional explaining in english what each is doing. I can figure it out from reading the code, but it'd be easier to just read a well written comment.

There's a comment in execute_type_demote_int

/* OCCURS_IN_ABNORMAL_PHI */

What does that comment mean? It doesn't seem to directly relate to any of the code in that file.

I'm going to assume your iterator is correct in the cases where you're able to optimize and that you don't end up skipping statements unexpectedly.


You compute CDI_DOMINATORS, but I don't see you use dominators -- you certainly do use post-domination.




Seems like a lot of stuff but I think this is pretty close. The biggest issue in my mind is we need a clear understanding of the vectorizer regressions.

We can kick this around more when we talk later today.

Cheers,
jeff



diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr45397-1.c 
b/gcc/testsuite/gcc.dg/tree-ssa/pr45397-1.c
new file mode 100644
index 0000000..ba395e4
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/pr45397-1.c
@@ -0,0 +1,16 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-tree-optimized" } */
+
+signed char a[1024], b[1024];
+
+void
+baz (void)
+{
+  int i, s, t;
+  for (i = 0; i < 1024; i++)
+    { s = a[i]; t = b[i]; s += t + 0x12345600; a[i] = s; }
+}
+
+/* { dg-final { scan-tree-dump-times "305419776" 0 "optimized" } } */
+/* { dg-final { cleanup-tree-dump "optimized" } } */
+
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr45397-2.c 
b/gcc/testsuite/gcc.dg/tree-ssa/pr45397-2.c
new file mode 100644
index 0000000..8185c69
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/pr45397-2.c
@@ -0,0 +1,19 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-tree-optimized" } */
+
+int foo (const unsigned char *tmp, int i, int val)
+{
+  return (unsigned char)(((tmp[i] + val)>0xFF)?0xFF:(((tmp[i] + 
val)<0)?0:(tmp[i] + val)));
+}
+
+int bar (const unsigned char *tmp, int i, int val)
+{
+  int x = (((tmp[i] + val)>0xFF)?0xFF:(((tmp[i] + val)<0)?0:(tmp[i] + val)));
+  return (unsigned char)x;
+}
+
+
+/* { dg-final { scan-tree-dump-times "MIN_EXPR" 2 "optimized" { xfail *-*-* } 
} } */
+/* { dg-final { scan-tree-dump-times "MAX_EXPR" 2 "optimized" { xfail *-*-* } 
} } */
+/* { dg-final { cleanup-tree-dump "optimized" } } */
+
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr45397-3.c 
b/gcc/testsuite/gcc.dg/tree-ssa/pr45397-3.c
new file mode 100644
index 0000000..15a493e
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/pr45397-3.c
@@ -0,0 +1,16 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-tree-optimized" } */
+
+
+int foo (const unsigned char *a, int b, int c)
+{
+  int x = (unsigned char) (a[b] + c);
+  int y = a[b] + c;
+  int z = (unsigned char) y;
+  return x == z;
+}
+
+
+/* { dg-final { scan-tree-dump-times "return 1;" 1 "optimized" { xfail *-*-* } 
} } */
+/* { dg-final { cleanup-tree-dump "optimized" } } */
+
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr45397-5.c 
b/gcc/testsuite/gcc.dg/tree-ssa/pr45397-5.c
new file mode 100644
index 0000000..f185526
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/pr45397-5.c
@@ -0,0 +1,34 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-tree-optimized" } */
+
+
+
+int f1 (int a, int b, int c)
+{
+  if ((a + b) == (c + a))
+   return 1;
+  return 0;
+}
+
+int f2 (int a, int b, int c)
+{
+  if ((a ^ b) == (a  ^ c))
+   return 1;
+  return 0;
+}
+
+
+int f3 (int a, int b)
+{
+  if (-a == (b - a))
+   return 1;
+  return 0;
+}
+
+
+
+/* { dg-final { scan-tree-dump-not "\\+" "optimized" { xfail *-*-* } } } */
+/* { dg-final { scan-tree-dump-not "\\^" "optimized" } } */
+/* { dg-final { scan-tree-dump-not "\\-" "optimized" { xfail *-*-* } } } */
+/* { dg-final { cleanup-tree-dump "optimized" } } */
+
diff --git a/gcc/testsuite/gcc.target/i386/pr45397-4.c 
b/gcc/testsuite/gcc.target/i386/pr45397-4.c
new file mode 100644
index 0000000..9bf66b7
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr45397-4.c
@@ -0,0 +1,28 @@
+/* { dg-do compile } */
+/* { dg-options "-O3 -fdump-tree-optimized -mavx" } */
+
+
+signed char a[1024], b[1024];
+
+void
+foo (void)
+{
+  int i, s, t;
+  for (i = 0; i < 1024; i++)
+    { s = a[i]; t = b[i]; s += t; a[i] = s; }
+}
+
+void
+bar (void)
+{
+  int i;
+  for (i = 0; i < 1024; i++)
+    a[i] += b[i];
+}
+
+
+
+/* { dg-final { scan-tree-dump-times "VIEW_CONVERT_EXPR" 6 "optimized" } } */
+/* { dg-final { scan-tree-dump-not "VEC_PACK_TRUNC_EXPR" "optimized" } } */
+/* { dg-final { cleanup-tree-dump "optimized" } } */
+
diff --git a/gcc/testsuite/gcc.target/i386/pr47477-1.c 
b/gcc/testsuite/gcc.target/i386/pr47477-1.c
new file mode 100644
index 0000000..a70ce87
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr47477-1.c
@@ -0,0 +1,29 @@
+/* { dg-do compile } */
+/* { dg-options "-O3 -fdump-tree-optimized -m32" } */
+
+
+void *
+add (void *a, void *b)
+{
+  return (void *)(__INTPTR_TYPE__) ((long long)(__INTPTR_TYPE__) a + ((long 
long)(__INTPTR_TYPE__) b & ~1L));
+}
+
+void *
+bar (void *a, void *b)
+{
+  long long tmp = (long long)(__INTPTR_TYPE__) a + ((long 
long)(__INTPTR_TYPE__) b & ~1L);
+  return (void *)(__INTPTR_TYPE__) tmp;
+}
+
+
+
+
+/* { dg-final { scan-tree-dump-times "\\(unsigned int\\)" 4 "optimized" } } */
+/* { dg-final { scan-tree-dump-times "& 4294967294" 2 "optimized" } } */
+/* { dg-final { scan-tree-dump-times "\\+" 2 "optimized" } } */
+/* { dg-final { scan-tree-dump-times "\\(void \\*\\)" 2 "optimized" } } */
+/* { dg-final { scan-tree-dump-not "\\(long long int\\)" "optimized" } } */
+/* { dg-final { scan-tree-dump-not "\\(int\\)" "optimized" } } */
+/* { dg-final { cleanup-tree-dump "optimized" } } */
+
+

Reply via email to