> -----Original Message-----
> From: Richard Biener [mailto:richard.guent...@gmail.com]
> Sent: Tuesday, November 05, 2013 8:18 PM
> To: Bin Cheng
> Cc: GCC Patches
> Subject: Re: [PATCH GCC]Simplify address expression in IVOPT
>
> On Tue, Nov 5, 2013 at 11:13 AM, bin.cheng <bin.ch...@arm.com> wrote:
> >
> >
> >> -----Original Message-----
> >> From: gcc-patches-ow...@gcc.gnu.org [mailto:gcc-patches-
> >> ow...@gcc.gnu.org] On Behalf Of bin.cheng
> >> Sent: Monday, November 04, 2013 4:35 PM
> >> To: 'Richard Biener'
> >> Cc: GCC Patches
> >> Subject: RE: [PATCH GCC]Simplify address expression in IVOPT
> >>
> >>
> >>
> >> > -----Original Message-----
> >> > From: Richard Biener [mailto:richard.guent...@gmail.com]
> >> > Sent: Wednesday, October 30, 2013 10:46 PM
> >> > To: Bin Cheng
> >> > Cc: GCC Patches
> >> > Subject: Re: [PATCH GCC]Simplify address expression in IVOPT
> >> >
> >> > On Tue, Oct 29, 2013 at 10:18 AM, bin.cheng <bin.ch...@arm.com>
> wrote:
> >> >
> >> > Hmm. I think you want what get_inner_reference_aff does, using the
> >> > return value of get_inner_reference as starting point for
> >> determine_base_object.
> >> > And you don't want to restrict yourselves so much on what exprs to
> >> process,
> >> > but only exclude DECL_Ps.
> >> Did you mean I should pass all address expressions to
> >> get_inner_reference_aff except the one with declaration operand?
> >> I am a little confused here why DECL_Ps should be handled specially?
> > Seems
> >> it can be handled properly by the get_* function. Anything important
> >> I missed?
> >>
> >> > Just amend get_inner_reference_aff to return the tree base object.
> >> I found it's inappropriate to do this because: functions like
> >> get_inner_reference* only accept reference expressions, while
> >> base_object has to be computed for other kinds of expressions too.
> >> Take gimple statement "a_1 = *ptr_1" as an example, the base passed
> >> to alloc_iv is
> > ptr_1;
> >> the base_object computed by determine_base_object is also ptr_1,
> >> which we can't rely on get_inner_reference* .
> >>
> >> Also It seems the comment before determine_base_object might be
> >> inaccurate:
> >> " Returns a memory object to that EXPR points." which I think is "
> >> Returns
> > a
> >> pointer pointing to the main memory object to that EXPR points."
> >> Right?
> >> >
> >> > Note that this isn't really "simplifying" but rather lowering all
> >> addresses.
> >> >
> >> > The other changes in this patch are unrelated, right?
> >> Right, I should have named this message like "refine cost computation
> >> of expressions in IVOPT" just as the patch file.
> >
> > Hi,
> > I updated the patch according to review comments. Now it lowers all
> > address expressions except ones with DECL_P to
> > get_inner_reference_aff, it also computes base_object which is used
later
> to ease determine_base_object.
> >
> > Bootstrap and test on x86/x86_64/arm on going, is it OK if tests pass?
>
> static struct iv *
> alloc_iv (tree base, tree step)
> {
> + tree expr = base;
> + tree base_object = base;
> struct iv *iv = XCNEW (struct iv);
> gcc_assert (step != NULL_TREE);
>
> + /* Lower all address expressions except ones with DECL_P as oeprand.
> + By doing this:
> + 1) More accurate cost can be computed for address expressions;
> + 2) Duplicate candidates won't be created for bases in different
> + forms, like &a[0] and &a. */ STRIP_NOPS (expr); if
> + (TREE_CODE (expr) == ADDR_EXPR && !DECL_P (TREE_OPERAND (expr, 0)))
> + {
> + aff_tree comb;
> + double_int size;
> + base_object = get_inner_reference_aff (TREE_OPERAND (expr, 0),
> + &comb, &size);
> + gcc_assert (base_object != NULL_TREE);
> + base = fold_convert (TREE_TYPE (base), aff_combination_to_tree
> (&comb));
> + }
> +
> iv->base = base;
> - iv->base_object = determine_base_object (base);
> + iv->base_object = determine_base_object (base_object);
>
>
> for less confusion do not introduce 'expr' but just base_object
> (s/expr/base_object/). Also can you split out this change (and the
related
> tree-affine.c one) from the rest?
>
> Ok with that change assuming it passes bootstrap & testing.
It passes test. I split it into two patches as attached.
Are these two OK?
Thanks,
bin
3-ivopt-expr_cost-a-20131106:
gcc/testsuite/ChangeLog
2013-11-06 Bin Cheng <bin.ch...@arm.com>
* gcc.dg/tree-ssa/loop-2.c: Refine check condition.
* gcc.dg/tree-ssa/ivopt_infer_2.c: Ditto.
* gcc.dg/tree-ssa/ivopt_mult_3.c: Ditto.
2013-11-06 Bin Cheng <bin.ch...@arm.com>
* tree-ssa-loop-ivopts.c (alloc_iv): Lower address expressions.
* tree-affine.c (get_inner_reference_aff): Return base.
* tree-affine.h (get_inner_reference_aff): Change prototype.
3-ivopt-expr_cost-b-20131106:
2013-11-06 Bin Cheng <bin.ch...@arm.com>
* tree-ssa-loop-ivopts.c (get_shiftadd_cost): Check equality
using operand_equal_p.
(force_expr_to_var_cost): Refactor the code. Handle type
conversion.
(split_address_cost): Call force_expr_to_var_cost.
Index: gcc/testsuite/gcc.dg/tree-ssa/loop-2.c
===================================================================
--- gcc/testsuite/gcc.dg/tree-ssa/loop-2.c (revision 204117)
+++ gcc/testsuite/gcc.dg/tree-ssa/loop-2.c (working copy)
@@ -27,7 +27,7 @@ void xxx(void)
/* { dg-final { scan-tree-dump-times " \\* \[^\\n\\r\]*=" 0 "optimized" } } */
/* { dg-final { scan-tree-dump-times "\[^\\n\\r\]*= \\* " 0 "optimized" } } */
-/* { dg-final { scan-tree-dump-times "MEM" 1 "optimized" } } */
+/* { dg-final { scan-tree-dump-times "MEM\\\[base" 1 "optimized" } } */
/* 17 * iter should be strength reduced. */
Index: gcc/testsuite/gcc.dg/tree-ssa/ivopt_infer_2.c
===================================================================
--- gcc/testsuite/gcc.dg/tree-ssa/ivopt_infer_2.c (revision 204117)
+++ gcc/testsuite/gcc.dg/tree-ssa/ivopt_infer_2.c (working copy)
@@ -7,7 +7,8 @@
extern char a[];
-/* Can not infer loop iteration from array -- exit test can not be replaced.
*/
+/* Can not infer loop iteration from array -- exit test can not be
+ replaced by the array address. */
void foo (unsigned int i_width, TYPE dst)
{
unsigned long long i = 0;
@@ -21,5 +22,5 @@ void foo (unsigned int i_width, TYPE dst)
}
}
-/* { dg-final { scan-tree-dump-times "Replacing" 0 "ivopts"} } */
+/* { dg-final { scan-tree-dump-times "\[^:\]*if \\(.*j_\[0-9\]+.*\\)" 1
"ivopts"} } */
/* { dg-final { cleanup-tree-dump "ivopts" } } */
Index: gcc/testsuite/gcc.dg/tree-ssa/ivopt_mult_3.c
===================================================================
--- gcc/testsuite/gcc.dg/tree-ssa/ivopt_mult_3.c (revision 204117)
+++ gcc/testsuite/gcc.dg/tree-ssa/ivopt_mult_3.c (working copy)
@@ -18,5 +18,5 @@ long foo(long* p, long* p2, int N1, int N2)
return s;
}
-/* { dg-final { scan-tree-dump-times "Replacing" 1 "ivopts"} } */
+/* { dg-final { scan-tree-dump-times "Replacing exit test: if \\(.*p2.*\\)" 1
"ivopts"} } */
/* { dg-final { cleanup-tree-dump "ivopts" } } */
Index: gcc/tree-ssa-loop-ivopts.c
===================================================================
--- gcc/tree-ssa-loop-ivopts.c (revision 204117)
+++ gcc/tree-ssa-loop-ivopts.c (working copy)
@@ -924,11 +924,30 @@ determine_base_object (tree expr)
static struct iv *
alloc_iv (tree base, tree step)
{
+ tree base_object = base;
struct iv *iv = XCNEW (struct iv);
gcc_assert (step != NULL_TREE);
+ /* Lower all address expressions except ones with DECL_P as operand.
+ By doing this:
+ 1) More accurate cost can be computed for address expressions;
+ 2) Duplicate candidates won't be created for bases in different
+ forms, like &a[0] and &a. */
+ STRIP_NOPS (base_object);
+ if (TREE_CODE (base_object) == ADDR_EXPR
+ && !DECL_P (TREE_OPERAND (base_object, 0)))
+ {
+ aff_tree comb;
+ double_int size;
+ base_object = get_inner_reference_aff (TREE_OPERAND (base_object, 0),
+ &comb, &size);
+ gcc_assert (base_object != NULL_TREE);
+ base_object = build_fold_addr_expr (base_object);
+ base = fold_convert (TREE_TYPE (base), aff_combination_to_tree (&comb));
+ }
+
iv->base = base;
- iv->base_object = determine_base_object (base);
+ iv->base_object = determine_base_object (base_object);
iv->step = step;
iv->biv_p = false;
iv->have_use_for = false;
Index: gcc/tree-affine.c
===================================================================
--- gcc/tree-affine.c (revision 204117)
+++ gcc/tree-affine.c (working copy)
@@ -874,10 +874,11 @@ debug_aff (aff_tree *val)
fprintf (stderr, "\n");
}
-/* Returns address of the reference REF in ADDR. The size of the accessed
- location is stored to SIZE. */
+/* Computes address of the reference REF in ADDR. The size of the accessed
+ location is stored to SIZE. Returns the ultimate containing object to
+ which REF refers. */
-void
+tree
get_inner_reference_aff (tree ref, aff_tree *addr, double_int *size)
{
HOST_WIDE_INT bitsize, bitpos;
@@ -904,6 +905,8 @@ get_inner_reference_aff (tree ref, aff_tree *addr,
aff_combination_add (addr, &tmp);
*size = double_int::from_shwi ((bitsize + BITS_PER_UNIT - 1) /
BITS_PER_UNIT);
+
+ return base;
}
/* Returns true if a region of size SIZE1 at position 0 and a region of
Index: gcc/tree-affine.h
===================================================================
--- gcc/tree-affine.h (revision 204117)
+++ gcc/tree-affine.h (working copy)
@@ -74,7 +74,7 @@ bool aff_combination_constant_multiple_p (aff_tree
void aff_combination_expand (aff_tree *, struct pointer_map_t **);
void tree_to_aff_combination_expand (tree, tree, aff_tree *,
struct pointer_map_t **);
-void get_inner_reference_aff (tree, aff_tree *, double_int *);
+tree get_inner_reference_aff (tree, aff_tree *, double_int *);
void free_affine_expand_cache (struct pointer_map_t **);
bool aff_comb_cannot_overlap_p (aff_tree *, double_int, double_int);
Index: gcc/tree-ssa-loop-ivopts.c
===================================================================
--- gcc/tree-ssa-loop-ivopts.c (revision 204117)
+++ gcc/tree-ssa-loop-ivopts.c (working copy)
@@ -3481,17 +3500,21 @@ get_shiftadd_cost (tree expr, enum machine_mode mo
int m = exact_log2 (int_cst_value (cst));
int maxm = MIN (BITS_PER_WORD, GET_MODE_BITSIZE (mode));
int sa_cost;
+ bool equal_p = false;
if (!(m >= 0 && m < maxm))
return false;
+ if (operand_equal_p (op1, mult, 0))
+ equal_p = true;
+
sa_cost = (TREE_CODE (expr) != MINUS_EXPR
? shiftadd_cost (speed, mode, m)
- : (mult == op1
+ : (equal_p
? shiftsub1_cost (speed, mode, m)
: shiftsub0_cost (speed, mode, m)));
res = new_cost (sa_cost, 0);
- res = add_costs (res, mult == op1 ? cost0 : cost1);
+ res = add_costs (res, equal_p ? cost0 : cost1);
STRIP_NOPS (multop);
if (!is_gimple_val (multop))
@@ -3585,30 +3608,14 @@ force_expr_to_var_cost (tree expr, bool speed)
op1 = TREE_OPERAND (expr, 1);
STRIP_NOPS (op0);
STRIP_NOPS (op1);
-
- if (is_gimple_val (op0))
- cost0 = no_cost;
- else
- cost0 = force_expr_to_var_cost (op0, speed);
-
- if (is_gimple_val (op1))
- cost1 = no_cost;
- else
- cost1 = force_expr_to_var_cost (op1, speed);
-
break;
+ case NOP_EXPR:
+ case CONVERT_EXPR:
case NEGATE_EXPR:
op0 = TREE_OPERAND (expr, 0);
STRIP_NOPS (op0);
op1 = NULL_TREE;
-
- if (is_gimple_val (op0))
- cost0 = no_cost;
- else
- cost0 = force_expr_to_var_cost (op0, speed);
-
- cost1 = no_cost;
break;
default:
@@ -3616,6 +3623,18 @@ force_expr_to_var_cost (tree expr, bool speed)
return new_cost (target_spill_cost[speed], 0);
}
+ if (op0 == NULL_TREE
+ || (is_gimple_val (op0) && (TREE_CODE (op0) != ADDR_EXPR)))
+ cost0 = no_cost;
+ else
+ cost0 = force_expr_to_var_cost (op0, speed);
+
+ if (op1 == NULL_TREE
+ || (is_gimple_val (op1) && (TREE_CODE (op1) != ADDR_EXPR)))
+ cost1 = no_cost;
+ else
+ cost1 = force_expr_to_var_cost (op1, speed);
+
mode = TYPE_MODE (TREE_TYPE (expr));
switch (TREE_CODE (expr))
{
@@ -3639,7 +3658,18 @@ force_expr_to_var_cost (tree expr, bool speed)
speed, &sa_cost))
return sa_cost;
}
+
break;
+ case NOP_EXPR:
+ case CONVERT_EXPR:
+ {
+ tree inner_mode, outer_mode;
+ outer_mode = TREE_TYPE (expr);
+ inner_mode = TREE_TYPE (op0);
+ cost = new_cost (convert_cost (TYPE_MODE (outer_mode),
+ TYPE_MODE (inner_mode), speed), 0);
+ }
+ break;
case MULT_EXPR:
if (cst_and_fits_in_hwi (op0))
@@ -3713,7 +3743,7 @@ split_address_cost (struct ivopts_data *data,
*var_present = true;
fd_ivopts_data = data;
walk_tree (&addr, find_depends, depends_on, NULL);
- return new_cost (target_spill_cost[data->speed], 0);
+ return force_expr_to_var_cost (addr, data->speed);
}
*offset += bitpos / BITS_PER_UNIT;