Hi Bill,
Thank you for the review and the offer to help.
On 10/01/13 15:36, Bill Schmidt wrote:
On Tue, 2013-10-01 at 08:17 -0500, Bill Schmidt wrote:
On Tue, 2013-10-01 at 12:19 +0200, Richard Biener wrote:
On Wed, Sep 25, 2013 at 1:37 PM, Yufeng Zhang<yufeng.zh...@arm.com> wrote:
Hello,
Please find the updated version of the patch in the attachment. It has
addressed the previous comments and also included some changes in order to
pass the bootstrapping on x86_64.
It's also passed the regtest on arm-none-eabi and aarch64-none-elf.
It will also fix the test failure as reported here:
http://gcc.gnu.org/ml/gcc-patches/2013-09/msg01317.html
OK for the trunk?
+ where n is a 32-bit unsigned int and pointer are 64-bit long. In this
+ case, the gimple for (n - 1) is:
+
+ _2 = n_1(D) + 4294967295; // 0xFFFFFFFF
+
+ and it is wrong to multiply the large constant by 4 in the 64-bit space. */
+
+static bool
+safe_to_multiply_p (tree type, double_int cst)
+{
+ if (TYPE_UNSIGNED (type)
+&& ! double_int_fits_to_tree_p (signed_type_for (type), cst))
+ return false;
+
+ return true;
+}
This looks wrong. The only relevant check is as whether the
multiplication overflows the original type as you miss the implicit
truncation that happens. Which is something you don't know
unless you know the value. It definitely isn't a property of a type
and a constant but the property of two constants and a type.
Or the predicate has a wrong name.
The use of get_unwidened in this core routine looks like this is
all happening in the wrong place and we should have picked up
another candidate for this instead? I'm sure Bill will know more here.
I'm not happy with how this patch is progressing. Without having looked
too deeply, this might be better handled earlier when determining which
casts are safe to use in building candidates. What you have here seems
more like closing the barn door after the horse got out. Maybe that's
the only solution, but it doesn't seem likely.
Another problem is that your test case isn't testing anything except
that the compiler doesn't crash. That isn't sufficient as a regression
test.
I'll spend some time looking at this to see if I can find a better
approach. It might be a day or two before I can get to it. In addition
to the included test case, are there any other cases you've found that I
should be concerned with?
To help me investigate this without having to build a cross compiler,
could you please compile your test case (without the patch applied)
using -fdump-tree-reassoc2 -fdump-tree-slsr-details and send me the
generated dump files?
The issue is not specific to AArch64; please find the attached dumps
generated from the x86-64 gcc by compiling gcc.dg/tree-ssa/slsr-39.c.
W.r.t your comment in the other email about adding a test to verify the
expected gimple, I think the existing test gcc.dg/tree-ssa/slsr-39.c is
sufficient. The test currently fails on both AArch64 and x86-64, and
presumably also fails on any other 64-bit target where pointer is 64-bit
and int is 32-bit size. The patch I proposed is to fix this issue and
gcc.dg/tree-ssa/slsr-39.c itself shall be a good regression test (with
specific verification on gimple ir).
The new test proposed in this patch is to regtest the issue my original
patch has, which is a runtime failure due to incorrect optimization.
I'll address other comments in separate emails.
Thanks,
Yufeng
;; Function foo (foo, funcdef_no=0, decl_uid=1722, symbol_order=0)
;; 1 loops found
;;
;; Loop 0
;; header 0, latch 1
;; depth 0, outer -1
;; nodes: 0 1 2
;; 2 succs { 1 }
foo (int[50] * a2, int v1)
{
int j;
long unsigned int _3;
long unsigned int _4;
int[50] * _6;
int _11;
int _12;
int _13;
<bb 2>:
j_2 = v1_1(D) + 5;
_3 = (long unsigned int) j_2;
_4 = _3 * 200;
_6 = a2_5(D) + _4;
j_7 = v1_1(D) + 6;
*_6[j_2] = j_2;
*_6[j_7] = j_2;
_11 = v1_1(D) + 4;
_12 = *_6[_11];
_13 = _12 + 1;
*_6[_11] = _13;
return;
}
;; Function foo (foo, funcdef_no=0, decl_uid=1722, symbol_order=0)
;; 1 loops found
;;
;; Loop 0
;; header 0, latch 1
;; depth 0, outer -1
;; nodes: 0 1 2
;; 2 succs { 1 }
Strength reduction candidate vector:
1 [2] j_2 = v1_1(D) + 5;
ADD : v1_1(D) + (5 * 1) : int
basis: 0 dependent: 5 sibling: 0
next-interp: 0 dead-savings: 0
2 [2] _3 = (long unsigned int) j_2;
ADD : v1_1(D) + (5 * 1) : long unsigned int
basis: 0 dependent: 0 sibling: 0
next-interp: 0 dead-savings: 0
3 [2] _4 = _3 * 200;
MULT : (v1_1(D) + 5) * 200 : long unsigned int
basis: 0 dependent: 0 sibling: 0
next-interp: 0 dead-savings: 1
4 [2] _6 = a2_5(D) + _4;
ADD : a2_5(D) + (1 * _4) : int[50] *
basis: 0 dependent: 0 sibling: 0
next-interp: 0 dead-savings: 0
5 [2] j_7 = v1_1(D) + 6;
ADD : v1_1(D) + (6 * 1) : int
basis: 1 dependent: 8 sibling: 0
next-interp: 0 dead-savings: 0
6 [2] *_6[j_2] = j_2;
REF : _6 + ((sizetype) j_2 * 4) + 0 : int[50] *
basis: 0 dependent: 0 sibling: 0
next-interp: 0 dead-savings: 0
7 [2] *_6[j_7] = j_2;
REF : _6 + ((sizetype) j_7 * 4) + 0 : int[50] *
basis: 0 dependent: 0 sibling: 0
next-interp: 0 dead-savings: 0
8 [2] _11 = v1_1(D) + 4;
ADD : v1_1(D) + (4 * 1) : int
basis: 5 dependent: 0 sibling: 0
next-interp: 0 dead-savings: 0
9 [2] _12 = *_6[_11];
REF : _6 + ((sizetype) _11 * 4) + 0 : int[50] *
basis: 0 dependent: 11 sibling: 0
next-interp: 0 dead-savings: 0
10 [2] _13 = _12 + 1;
ADD : _12 + (1 * 1) : int
basis: 0 dependent: 0 sibling: 0
next-interp: 0 dead-savings: 0
11 [2] *_6[_11] = _13;
REF : _6 + ((sizetype) _11 * 4) + 0 : int[50] *
basis: 9 dependent: 0 sibling: 0
next-interp: 0 dead-savings: 0
Strength reduction candidate chains:
_6 -> 6 -> 11 -> 9 -> 7
v1_1(D) -> 1 -> 8 -> 5 -> 3 -> 2
a2_5(D) -> 4
_12 -> 10
Processing dependency tree rooted at 1.
Processing dependency tree rooted at 9.
foo (int[50] * a2, int v1)
{
int j;
long unsigned int _3;
long unsigned int _4;
int[50] * _6;
int _11;
int _12;
int _13;
sizetype _15;
sizetype _16;
int[50] * _17;
sizetype _18;
sizetype _19;
int[50] * _20;
<bb 2>:
j_2 = v1_1(D) + 5;
_3 = (long unsigned int) j_2;
_4 = _3 * 200;
_6 = a2_5(D) + _4;
j_7 = v1_1(D) + 6;
*_6[j_2] = j_2;
*_6[j_7] = j_2;
_11 = v1_1(D) + 4;
_15 = (sizetype) _11;
_16 = _15 * 4;
_17 = _6 + _16;
_12 = MEM[(int[50] *)_17];
_13 = _12 + 1;
_18 = (sizetype) _11;
_19 = _18 * 4;
_20 = _6 + _19;
MEM[(int[50] *)_20] = _13;
return;
}