On 07/08/13 11:45, Kai Tietz wrote:
Hello,
These passes are implementing type-demotion (type sinking into
statements) for some gimple statements. I limitted inital
implementation to the set of multiply, addition, substraction, and
binary-and/or/xor. Additional this pass adds some rules to sink
type-cast sequences - eg. (int) (short) x; with char as type of x.
This special handing in this pass of such type-sequence
simplification is necessary to avoid too complex cast-sequences by
type-unsigned conversion used by this pass to avoid undefined
overflow behaviour.
I will sent separate patch with some test-cases to demonstrate and
verify operation of this new optimization. Just one sample I will
cite here to demonstrate operation of type-demotion pass.
So I think we want to come back to this patch and make some decisions
about how to go forward.
I've been reviewing all the discussions back through last year and I
think the single biggest thing we need to realize is that there's
nothing in this patch that *couldn't* be handled by tree-ssa-forwprop
with a suitable amount of following the use-def chains through casts in
various transformations and determining when those casts can be safely
ignored. However, I don't think extending tree-ssa-forwprop in this way
is wise.
As I see it, the value in this patch is it allows us to avoid that
nonsense by essentially reformulating this a problem of moving and
reformulating the casts in the IL.
I see two primary effects of type sinking. First and probably the most
important in my mind is by sinking a cast through its uses the various
transformations we already perform are more likely to apply *without*
needing to handle optimizing through typecasts explicitly.
The second primary effect is, given two casts where the first indirectly
feeds the second (ie, the first feeds some statement, which then feeds
the second cast), if we're able to sink the first cast, we end up with
the first cast directly feeding the second cast. When this occurs one
of the two casts can often be eliminated. Sadly, I didn't keep any of
those test files, but I regularly saw them in GCC bootstraps.
Kai, you mentioned you had more tests, but I never saw those posted. I
pulled together a handful of tests from the PRs & discussion threads.
Some may be duplicates of yours (mine are attached). If you could post
yours as well, it'd be helpful. Not all of mine are helped by this
patch, but at least two are.
There was some question about what to do with PROMOTE_MODE based
targets. On those targets there's a lot of value in getting something
into word mode and doing everything in word mode. I think that
can/should be a separate issue -- ISTM that ought to be handled fairly
late in the tree optimizers. I don't see a strong argument for gating
this patch on catering to PROMOTE_MODE.
Similarly, I know there's a type hoisting patch that's also queued up.
I think it should be handled separately as well.
------end------
You will notice that by typedemote2 pass the 's += t + 0x12345600;'
expression gets simplified to 's += t + 0;'.
I have added an additional early pass "typedemote1" to this patch for
simple cases types can be easily sunken into statement without
special unsigned-cast for overflow-case. Jakub asked for it. Tests
have shown that this pass does optimizations in pretty few cases. As
example in testsuite see for example pr46867.c testcase. The second
pass (I put it behind first vrp pass to avoid testcase-conflicts)
uses 'unsigned'-type casts to avoid undefined overflow behavior. This
pass has much more hits in standard code, but seems to trigger some
regressions in vect pass:
List of regi
FAIL: gcc.dg/vect/slp-cond-3.c scan-tree-dump-times vect "vectorizing
stmts using SLP" 1 FAIL: gcc.dg/vect/vect-reduc-dot-s8b.c -flto
scan-tree-dump-times vect "vect_recog_widen_mult_pattern: detected"
1 FAIL: gcc.dg/vect/slp-cond-3.c -flto scan-tree-dump-times vect
"vectorizing stmts using SLP" 1 FAIL: gcc.target/i386/rotate-1.c
scan-assembler ro[lr]b
Have you analyzed these failures? Are they a result of the casts ending
up in inconvenient locations or is there something more subtle going on
here? We certainly need to understand precisely what's going on with
these regressions before we can move forward.
As for the patch itself. Obviously it needs updates as a result of
David's work on the pass manager class, and other changes on the trunk
since July.
tree-ssa-te.c needs a file block comment which describes what the patch
is trying to accomplish.
Are all the includes necessary? cfgloop.h hash-table.h, others? We're
trying to clean things up, new files ought to be relatively clean as
they go in rather than making things worse :-)
build_and_add_sum is poorly named. It handles far more operations than
its name suggests. Similarly its embedded comments still refer to
"addition statement" and clearly need updating. It may need updating
with the streamlined code to build up statements as well.
Hmm, I don't think I'm going to call out all the updates to deal with
codes in the trunk codebase -- I'm going to assume you'll fix those as
part of trying to get this running on the trunk again.
The code to find an insertion point seems to duplicate two large blobs
of code, one when we insert relative to op2, the other when we insert
relative to op1. ISTM that code should be refactored to avoid the
duplication. Once that refactoring is done, build_and_add_sum is going
to be pretty damn simple and easy to understand.
Is there a good reason why we have a recursive dead code eliminator in
this pass? ie, is there a good reason why we don't just leave that
stuff in the IL and let our standard DCE take care of it?
(remove_stmt_chain and its caller is what I'm referring to).
Duplicating DCE, even a trivial one like this is prone to long term
maintenance problems. Do we gain something by cleaning up after
ourselves? Are the conditions under which we optimize so strict that we
don't have to perform all the sanity checks before removing a statement
that we find in tree-ssa-dce.c?
gen_cast seems to return either a INTEGER_CST or an expression -- it
doesn't return an assignment. It seems like the comment at the start of
that function needs to be updated. Shouldn't the dumping be conditional
on TDF_DETAILS?
I'm a bit surprised we don't have an equivalent of demote_cast_p
somewhere. If not, it feels like that is generic enough that we'd like
it elsewhere so that it can be re-used.
In demote_into, use (T) in the comment to refer to the type rather than
(typ). The former is used elsewhere in this file and all over
fold-const.c and other places. It's a minor nit, but the consistency
helps folks reading the code in the future.
Please use consistent grouping in the switch statement in demote_into:
+ case MULT_EXPR:
+ case PLUS_EXPR: case MINUS_EXPR:
I think GNU standards recommend each on its own line. If you're going
to group them, then ISTM all three belong on the same line. Whatever
style you use, use it consistently in that function please.
Presumably you don't group these with their related logicals because
arithmetics have to deal with overflow?
I'm going to assume the conditions under which you apply the
transformations and any adjustments you make are correct. It might help
to add some comments to the handling of the MULT, PLUS & MINUS. You've
got 3 strategies in there, but no comments as to why you use a
particular one at any given time. Similarly for MIN_EXPR, MAX_EXPR
+ /* (NEWTYPE) (X >> Y) can be transformed to
+ ((NEWTYPE) X) >> Y, if NEWTYPE == X and
+ SIGN(NEWTYPE) == SIGN(X). */
The comments here are a bit confusing -- you're talking about equality
of a type and an object. I'm guessing you really mean NEWTYPE == TYPEOF
(X)? I think this occurred elsewhere as well, the mixing of objects and
types is a bit too free IMHO. Better to be explicit than assume the
reader will make the leap that you're talking about TYPEOF (X).
Again, it seems like dumping should be dependent on TDF_DETAILS.
In execute_type_demote_int, you've got two big conditionals. The first
checks the lhs, the second the rhs. I'd separate them with one line of
vertical whitespace and add a comment before each conditional explaining
in english what each is doing. I can figure it out from reading the
code, but it'd be easier to just read a well written comment.
There's a comment in execute_type_demote_int
/* OCCURS_IN_ABNORMAL_PHI */
What does that comment mean? It doesn't seem to directly relate to any
of the code in that file.
I'm going to assume your iterator is correct in the cases where you're
able to optimize and that you don't end up skipping statements unexpectedly.
You compute CDI_DOMINATORS, but I don't see you use dominators -- you
certainly do use post-domination.
Seems like a lot of stuff but I think this is pretty close. The biggest
issue in my mind is we need a clear understanding of the vectorizer
regressions.
We can kick this around more when we talk later today.
Cheers,
jeff
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr45397-1.c
b/gcc/testsuite/gcc.dg/tree-ssa/pr45397-1.c
new file mode 100644
index 0000000..ba395e4
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/pr45397-1.c
@@ -0,0 +1,16 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-tree-optimized" } */
+
+signed char a[1024], b[1024];
+
+void
+baz (void)
+{
+ int i, s, t;
+ for (i = 0; i < 1024; i++)
+ { s = a[i]; t = b[i]; s += t + 0x12345600; a[i] = s; }
+}
+
+/* { dg-final { scan-tree-dump-times "305419776" 0 "optimized" } } */
+/* { dg-final { cleanup-tree-dump "optimized" } } */
+
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr45397-2.c
b/gcc/testsuite/gcc.dg/tree-ssa/pr45397-2.c
new file mode 100644
index 0000000..8185c69
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/pr45397-2.c
@@ -0,0 +1,19 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-tree-optimized" } */
+
+int foo (const unsigned char *tmp, int i, int val)
+{
+ return (unsigned char)(((tmp[i] + val)>0xFF)?0xFF:(((tmp[i] +
val)<0)?0:(tmp[i] + val)));
+}
+
+int bar (const unsigned char *tmp, int i, int val)
+{
+ int x = (((tmp[i] + val)>0xFF)?0xFF:(((tmp[i] + val)<0)?0:(tmp[i] + val)));
+ return (unsigned char)x;
+}
+
+
+/* { dg-final { scan-tree-dump-times "MIN_EXPR" 2 "optimized" { xfail *-*-* }
} } */
+/* { dg-final { scan-tree-dump-times "MAX_EXPR" 2 "optimized" { xfail *-*-* }
} } */
+/* { dg-final { cleanup-tree-dump "optimized" } } */
+
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr45397-3.c
b/gcc/testsuite/gcc.dg/tree-ssa/pr45397-3.c
new file mode 100644
index 0000000..15a493e
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/pr45397-3.c
@@ -0,0 +1,16 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-tree-optimized" } */
+
+
+int foo (const unsigned char *a, int b, int c)
+{
+ int x = (unsigned char) (a[b] + c);
+ int y = a[b] + c;
+ int z = (unsigned char) y;
+ return x == z;
+}
+
+
+/* { dg-final { scan-tree-dump-times "return 1;" 1 "optimized" { xfail *-*-* }
} } */
+/* { dg-final { cleanup-tree-dump "optimized" } } */
+
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr45397-5.c
b/gcc/testsuite/gcc.dg/tree-ssa/pr45397-5.c
new file mode 100644
index 0000000..f185526
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/pr45397-5.c
@@ -0,0 +1,34 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-tree-optimized" } */
+
+
+
+int f1 (int a, int b, int c)
+{
+ if ((a + b) == (c + a))
+ return 1;
+ return 0;
+}
+
+int f2 (int a, int b, int c)
+{
+ if ((a ^ b) == (a ^ c))
+ return 1;
+ return 0;
+}
+
+
+int f3 (int a, int b)
+{
+ if (-a == (b - a))
+ return 1;
+ return 0;
+}
+
+
+
+/* { dg-final { scan-tree-dump-not "\\+" "optimized" { xfail *-*-* } } } */
+/* { dg-final { scan-tree-dump-not "\\^" "optimized" } } */
+/* { dg-final { scan-tree-dump-not "\\-" "optimized" { xfail *-*-* } } } */
+/* { dg-final { cleanup-tree-dump "optimized" } } */
+
diff --git a/gcc/testsuite/gcc.target/i386/pr45397-4.c
b/gcc/testsuite/gcc.target/i386/pr45397-4.c
new file mode 100644
index 0000000..9bf66b7
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr45397-4.c
@@ -0,0 +1,28 @@
+/* { dg-do compile } */
+/* { dg-options "-O3 -fdump-tree-optimized -mavx" } */
+
+
+signed char a[1024], b[1024];
+
+void
+foo (void)
+{
+ int i, s, t;
+ for (i = 0; i < 1024; i++)
+ { s = a[i]; t = b[i]; s += t; a[i] = s; }
+}
+
+void
+bar (void)
+{
+ int i;
+ for (i = 0; i < 1024; i++)
+ a[i] += b[i];
+}
+
+
+
+/* { dg-final { scan-tree-dump-times "VIEW_CONVERT_EXPR" 6 "optimized" } } */
+/* { dg-final { scan-tree-dump-not "VEC_PACK_TRUNC_EXPR" "optimized" } } */
+/* { dg-final { cleanup-tree-dump "optimized" } } */
+
diff --git a/gcc/testsuite/gcc.target/i386/pr47477-1.c
b/gcc/testsuite/gcc.target/i386/pr47477-1.c
new file mode 100644
index 0000000..a70ce87
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr47477-1.c
@@ -0,0 +1,29 @@
+/* { dg-do compile } */
+/* { dg-options "-O3 -fdump-tree-optimized -m32" } */
+
+
+void *
+add (void *a, void *b)
+{
+ return (void *)(__INTPTR_TYPE__) ((long long)(__INTPTR_TYPE__) a + ((long
long)(__INTPTR_TYPE__) b & ~1L));
+}
+
+void *
+bar (void *a, void *b)
+{
+ long long tmp = (long long)(__INTPTR_TYPE__) a + ((long
long)(__INTPTR_TYPE__) b & ~1L);
+ return (void *)(__INTPTR_TYPE__) tmp;
+}
+
+
+
+
+/* { dg-final { scan-tree-dump-times "\\(unsigned int\\)" 4 "optimized" } } */
+/* { dg-final { scan-tree-dump-times "& 4294967294" 2 "optimized" } } */
+/* { dg-final { scan-tree-dump-times "\\+" 2 "optimized" } } */
+/* { dg-final { scan-tree-dump-times "\\(void \\*\\)" 2 "optimized" } } */
+/* { dg-final { scan-tree-dump-not "\\(long long int\\)" "optimized" } } */
+/* { dg-final { scan-tree-dump-not "\\(int\\)" "optimized" } } */
+/* { dg-final { cleanup-tree-dump "optimized" } } */
+
+