Re: RFC: Add of type-demotion pass

Jeff Law Tue, 15 Oct 2013 22:59:57 -0700

On 07/08/13 11:45, Kai Tietz wrote:

Hello,


These passes are implementing type-demotion (type sinking into
statements) for some gimple statements.  I limitted inital
implementation to the set of multiply, addition, substraction, and
binary-and/or/xor.  Additional this pass adds some rules to sink
type-cast sequences - eg. (int) (short) x; with char as type of x.
This special handing in this pass of such type-sequence
simplification is necessary to avoid too complex cast-sequences by
type-unsigned conversion used by this pass to avoid undefined
overflow behaviour.

I will sent separate patch with some test-cases to demonstrate and
verify operation of this new optimization.  Just one sample I will
cite here to demonstrate operation of type-demotion pass.

So I think we want to come back to this patch and make some decisionsabout how to go forward.

I've been reviewing all the discussions back through last year and Ithink the single biggest thing we need to realize is that there'snothing in this patch that *couldn't* be handled by tree-ssa-forwpropwith a suitable amount of following the use-def chains through casts invarious transformations and determining when those casts can be safelyignored. However, I don't think extending tree-ssa-forwprop in this wayis wise.

As I see it, the value in this patch is it allows us to avoid thatnonsense by essentially reformulating this a problem of moving andreformulating the casts in the IL.

I see two primary effects of type sinking. First and probably the mostimportant in my mind is by sinking a cast through its uses the varioustransformations we already perform are more likely to apply *without*needing to handle optimizing through typecasts explicitly.

The second primary effect is, given two casts where the first indirectlyfeeds the second (ie, the first feeds some statement, which then feedsthe second cast), if we're able to sink the first cast, we end up withthe first cast directly feeding the second cast. When this occurs oneof the two casts can often be eliminated. Sadly, I didn't keep any ofthose test files, but I regularly saw them in GCC bootstraps.

Kai, you mentioned you had more tests, but I never saw those posted. Ipulled together a handful of tests from the PRs & discussion threads.Some may be duplicates of yours (mine are attached). If you could postyours as well, it'd be helpful. Not all of mine are helped by thispatch, but at least two are.

There was some question about what to do with PROMOTE_MODE basedtargets. On those targets there's a lot of value in getting somethinginto word mode and doing everything in word mode. I think thatcan/should be a separate issue -- ISTM that ought to be handled fairlylate in the tree optimizers. I don't see a strong argument for gatingthis patch on catering to PROMOTE_MODE.

Similarly, I know there's a type hoisting patch that's also queued up.I think it should be handled separately as well.

------end------

You will notice that by typedemote2 pass the 's += t + 0x12345600;'
expression gets simplified to 's += t + 0;'.

I have added an additional early pass "typedemote1" to this patch for
simple cases types can be easily sunken into statement without
special unsigned-cast for overflow-case.   Jakub asked for it. Tests
have shown that this pass does optimizations in pretty few cases.  As
example in testsuite see for example pr46867.c testcase. The second
pass (I put it behind first vrp pass to avoid testcase-conflicts)
uses 'unsigned'-type casts to avoid undefined overflow behavior. This
pass has much more hits in standard code, but seems to trigger some
regressions in vect pass:

List of regi

FAIL: gcc.dg/vect/slp-cond-3.c scan-tree-dump-times vect "vectorizing
stmts using SLP" 1 FAIL: gcc.dg/vect/vect-reduc-dot-s8b.c -flto
scan-tree-dump-times vect "vect_recog_widen_mult_pattern: detected"
1 FAIL: gcc.dg/vect/slp-cond-3.c -flto  scan-tree-dump-times vect
"vectorizing stmts using SLP" 1 FAIL: gcc.target/i386/rotate-1.c
scan-assembler ro[lr]b

Have you analyzed these failures? Are they a result of the casts endingup in inconvenient locations or is there something more subtle going onhere? We certainly need to understand precisely what's going on withthese regressions before we can move forward.

As for the patch itself. Obviously it needs updates as a result ofDavid's work on the pass manager class, and other changes on the trunksince July.

tree-ssa-te.c needs a file block comment which describes what the patchis trying to accomplish.

Are all the includes necessary? cfgloop.h hash-table.h, others? We'retrying to clean things up, new files ought to be relatively clean asthey go in rather than making things worse :-)

build_and_add_sum is poorly named. It handles far more operations thanits name suggests. Similarly its embedded comments still refer to"addition statement" and clearly need updating. It may need updatingwith the streamlined code to build up statements as well.

Hmm, I don't think I'm going to call out all the updates to deal withcodes in the trunk codebase -- I'm going to assume you'll fix those aspart of trying to get this running on the trunk again.

The code to find an insertion point seems to duplicate two large blobsof code, one when we insert relative to op2, the other when we insertrelative to op1. ISTM that code should be refactored to avoid theduplication. Once that refactoring is done, build_and_add_sum is goingto be pretty damn simple and easy to understand.

Is there a good reason why we have a recursive dead code eliminator inthis pass? ie, is there a good reason why we don't just leave thatstuff in the IL and let our standard DCE take care of it?(remove_stmt_chain and its caller is what I'm referring to).Duplicating DCE, even a trivial one like this is prone to long termmaintenance problems. Do we gain something by cleaning up afterourselves? Are the conditions under which we optimize so strict that wedon't have to perform all the sanity checks before removing a statementthat we find in tree-ssa-dce.c?

gen_cast seems to return either a INTEGER_CST or an expression -- itdoesn't return an assignment. It seems like the comment at the start ofthat function needs to be updated. Shouldn't the dumping be conditionalon TDF_DETAILS?

I'm a bit surprised we don't have an equivalent of demote_cast_psomewhere. If not, it feels like that is generic enough that we'd likeit elsewhere so that it can be re-used.

In demote_into, use (T) in the comment to refer to the type rather than(typ). The former is used elsewhere in this file and all overfold-const.c and other places. It's a minor nit, but the consistencyhelps folks reading the code in the future.


Please use consistent grouping in the switch statement in demote_into:

+    case MULT_EXPR:
+    case PLUS_EXPR: case MINUS_EXPR:

I think GNU standards recommend each on its own line. If you're goingto group them, then ISTM all three belong on the same line. Whateverstyle you use, use it consistently in that function please.

Presumably you don't group these with their related logicals becausearithmetics have to deal with overflow?

I'm going to assume the conditions under which you apply thetransformations and any adjustments you make are correct. It might helpto add some comments to the handling of the MULT, PLUS & MINUS. You'vegot 3 strategies in there, but no comments as to why you use aparticular one at any given time. Similarly for MIN_EXPR, MAX_EXPR



+      /* (NEWTYPE) (X >> Y) can be transformed to
+         ((NEWTYPE) X) >> Y, if NEWTYPE == X and
+         SIGN(NEWTYPE) == SIGN(X).  */

The comments here are a bit confusing -- you're talking about equalityof a type and an object. I'm guessing you really mean NEWTYPE == TYPEOF(X)? I think this occurred elsewhere as well, the mixing of objects andtypes is a bit too free IMHO. Better to be explicit than assume thereader will make the leap that you're talking about TYPEOF (X).


Again, it seems like dumping should be dependent on TDF_DETAILS.

In execute_type_demote_int, you've got two big conditionals. The firstchecks the lhs, the second the rhs. I'd separate them with one line ofvertical whitespace and add a comment before each conditional explainingin english what each is doing. I can figure it out from reading thecode, but it'd be easier to just read a well written comment.


There's a comment in execute_type_demote_int

/* OCCURS_IN_ABNORMAL_PHI */

What does that comment mean? It doesn't seem to directly relate to anyof the code in that file.

I'm going to assume your iterator is correct in the cases where you'reable to optimize and that you don't end up skipping statements unexpectedly.

You compute CDI_DOMINATORS, but I don't see you use dominators -- youcertainly do use post-domination.

Seems like a lot of stuff but I think this is pretty close. The biggestissue in my mind is we need a clear understanding of the vectorizerregressions.


We can kick this around more when we talk later today.

Cheers,
jeff

diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr45397-1.c 
b/gcc/testsuite/gcc.dg/tree-ssa/pr45397-1.c
new file mode 100644
index 0000000..ba395e4
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/pr45397-1.c
@@ -0,0 +1,16 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-tree-optimized" } */
+
+signed char a[1024], b[1024];
+
+void
+baz (void)
+{
+  int i, s, t;
+  for (i = 0; i < 1024; i++)
+    { s = a[i]; t = b[i]; s += t + 0x12345600; a[i] = s; }
+}
+
+/* { dg-final { scan-tree-dump-times "305419776" 0 "optimized" } } */
+/* { dg-final { cleanup-tree-dump "optimized" } } */
+
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr45397-2.c 
b/gcc/testsuite/gcc.dg/tree-ssa/pr45397-2.c
new file mode 100644
index 0000000..8185c69
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/pr45397-2.c
@@ -0,0 +1,19 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-tree-optimized" } */
+
+int foo (const unsigned char *tmp, int i, int val)
+{
+  return (unsigned char)(((tmp[i] + val)>0xFF)?0xFF:(((tmp[i] + 
val)<0)?0:(tmp[i] + val)));
+}
+
+int bar (const unsigned char *tmp, int i, int val)
+{
+  int x = (((tmp[i] + val)>0xFF)?0xFF:(((tmp[i] + val)<0)?0:(tmp[i] + val)));
+  return (unsigned char)x;
+}
+
+
+/* { dg-final { scan-tree-dump-times "MIN_EXPR" 2 "optimized" { xfail *-*-* } 
} } */
+/* { dg-final { scan-tree-dump-times "MAX_EXPR" 2 "optimized" { xfail *-*-* } 
} } */
+/* { dg-final { cleanup-tree-dump "optimized" } } */
+
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr45397-3.c 
b/gcc/testsuite/gcc.dg/tree-ssa/pr45397-3.c
new file mode 100644
index 0000000..15a493e
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/pr45397-3.c
@@ -0,0 +1,16 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-tree-optimized" } */
+
+
+int foo (const unsigned char *a, int b, int c)
+{
+  int x = (unsigned char) (a[b] + c);
+  int y = a[b] + c;
+  int z = (unsigned char) y;
+  return x == z;
+}
+
+
+/* { dg-final { scan-tree-dump-times "return 1;" 1 "optimized" { xfail *-*-* } 
} } */
+/* { dg-final { cleanup-tree-dump "optimized" } } */
+
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr45397-5.c 
b/gcc/testsuite/gcc.dg/tree-ssa/pr45397-5.c
new file mode 100644
index 0000000..f185526
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/pr45397-5.c
@@ -0,0 +1,34 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-tree-optimized" } */
+
+
+
+int f1 (int a, int b, int c)
+{
+  if ((a + b) == (c + a))
+   return 1;
+  return 0;
+}
+
+int f2 (int a, int b, int c)
+{
+  if ((a ^ b) == (a  ^ c))
+   return 1;
+  return 0;
+}
+
+
+int f3 (int a, int b)
+{
+  if (-a == (b - a))
+   return 1;
+  return 0;
+}
+
+
+
+/* { dg-final { scan-tree-dump-not "\\+" "optimized" { xfail *-*-* } } } */
+/* { dg-final { scan-tree-dump-not "\\^" "optimized" } } */
+/* { dg-final { scan-tree-dump-not "\\-" "optimized" { xfail *-*-* } } } */
+/* { dg-final { cleanup-tree-dump "optimized" } } */
+
diff --git a/gcc/testsuite/gcc.target/i386/pr45397-4.c 
b/gcc/testsuite/gcc.target/i386/pr45397-4.c
new file mode 100644
index 0000000..9bf66b7
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr45397-4.c
@@ -0,0 +1,28 @@
+/* { dg-do compile } */
+/* { dg-options "-O3 -fdump-tree-optimized -mavx" } */
+
+
+signed char a[1024], b[1024];
+
+void
+foo (void)
+{
+  int i, s, t;
+  for (i = 0; i < 1024; i++)
+    { s = a[i]; t = b[i]; s += t; a[i] = s; }
+}
+
+void
+bar (void)
+{
+  int i;
+  for (i = 0; i < 1024; i++)
+    a[i] += b[i];
+}
+
+
+
+/* { dg-final { scan-tree-dump-times "VIEW_CONVERT_EXPR" 6 "optimized" } } */
+/* { dg-final { scan-tree-dump-not "VEC_PACK_TRUNC_EXPR" "optimized" } } */
+/* { dg-final { cleanup-tree-dump "optimized" } } */
+
diff --git a/gcc/testsuite/gcc.target/i386/pr47477-1.c 
b/gcc/testsuite/gcc.target/i386/pr47477-1.c
new file mode 100644
index 0000000..a70ce87
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr47477-1.c
@@ -0,0 +1,29 @@
+/* { dg-do compile } */
+/* { dg-options "-O3 -fdump-tree-optimized -m32" } */
+
+
+void *
+add (void *a, void *b)
+{
+  return (void *)(__INTPTR_TYPE__) ((long long)(__INTPTR_TYPE__) a + ((long 
long)(__INTPTR_TYPE__) b & ~1L));
+}
+
+void *
+bar (void *a, void *b)
+{
+  long long tmp = (long long)(__INTPTR_TYPE__) a + ((long 
long)(__INTPTR_TYPE__) b & ~1L);
+  return (void *)(__INTPTR_TYPE__) tmp;
+}
+
+
+
+
+/* { dg-final { scan-tree-dump-times "\\(unsigned int\\)" 4 "optimized" } } */
+/* { dg-final { scan-tree-dump-times "& 4294967294" 2 "optimized" } } */
+/* { dg-final { scan-tree-dump-times "\\+" 2 "optimized" } } */
+/* { dg-final { scan-tree-dump-times "\\(void \\*\\)" 2 "optimized" } } */
+/* { dg-final { scan-tree-dump-not "\\(long long int\\)" "optimized" } } */
+/* { dg-final { scan-tree-dump-not "\\(int\\)" "optimized" } } */
+/* { dg-final { cleanup-tree-dump "optimized" } } */
+
+

Re: RFC: Add of type-demotion pass

Reply via email to