Hi All, IVOPTS normally uses affine trees to perform comparisons between different IVs, but these seem to have been missing in two key spots and instead normal tree equivalencies used.
In some cases where we have a structural equivalence but not a signedness equivalencies we end up generating both a signed and unsigned IV for the same candidate. This happens quite a lot with fortran but can also happen in C because this came code is unable to figure out when one expression is a multiple of another. As an example in the attached testcase we get: Initial set of candidates: cost: 24 (complexity 3) reg_cost: 9 cand_cost: 15 cand_group_cost: 0 (complexity 3) candidates: 1, 6, 8 group:0 --> iv_cand:6, cost=(0,1) group:1 --> iv_cand:1, cost=(0,0) group:2 --> iv_cand:8, cost=(0,1) group:3 --> iv_cand:8, cost=(0,1) invariant variables: 6 invariant expressions: 1, 2 <Invariant Expressions>: inv_expr 1: stride.3_27 * 4 inv_expr 2: (unsigned long) stride.3_27 * 4 These end up being used in the same group: Group 1: cand cost compl. inv.expr. inv.vars 1 0 0 NIL; 6 2 0 0 NIL; 6 3 0 0 NIL; 6 which ends up with IV opts picking the signed and unsigned IVs: Improved to: cost: 24 (complexity 3) reg_cost: 9 cand_cost: 15 cand_group_cost: 0 (complexity 3) candidates: 1, 6, 8 group:0 --> iv_cand:6, cost=(0,1) group:1 --> iv_cand:1, cost=(0,0) group:2 --> iv_cand:8, cost=(0,1) group:3 --> iv_cand:8, cost=(0,1) invariant variables: 6 invariant expressions: 1, 2 and so generates the same IV as both signed and unsigned: ;; basic block 21, loop depth 3, count 214748368 (estimated locally, freq 58.2545), maybe hot ;; prev block 28, next block 31, flags: (NEW, REACHABLE, VISITED) ;; pred: 28 [always] count:23622320 (estimated locally, freq 6.4080) (FALLTHRU,EXECUTABLE) ;; 25 [always] count:191126046 (estimated locally, freq 51.8465) (FALLTHRU,DFS_BACK,EXECUTABLE) # .MEM_66 = PHI <.MEM_34(28), .MEM_22(25)> # ivtmp.22_41 = PHI <0(28), ivtmp.22_82(25)> # ivtmp.26_51 = PHI <ivtmp.26_55(28), ivtmp.26_72(25)> # ivtmp.28_90 = PHI <ivtmp.28_99(28), ivtmp.28_98(25)> ... ;; basic block 24, loop depth 3, count 214748366 (estimated locally, freq 58.2545), maybe hot ;; prev block 22, next block 25, flags: (NEW, REACHABLE, VISITED)' ;; pred: 22 [always] count:95443719 (estimated locally, freq 25.8909) (FALLTHRU) ;; 21 [33.3% (guessed)] count:71582790 (estimated locally, freq 19.4182) (TRUE_VALUE,EXECUTABLE) ;; 31 [33.3% (guessed)] count:47721860 (estimated locally, freq 12.9455) (TRUE_VALUE,EXECUTABLE) # .MEM_22 = PHI <.MEM_44(22), .MEM_31(21), .MEM_79(31)> ivtmp.22_82 = ivtmp.22_41 + 1; ivtmp.26_72 = ivtmp.26_51 + _80; ivtmp.28_98 = ivtmp.28_90 + _39; These two IVs are always used as unsigned, so IV ops generates: _73 = stride.3_27 * 4; _80 = (unsigned long) _73; _54 = (unsigned long) stride.3_27; _39 = _54 * 4; Which means that in e.g. exchange2 we generate a lot of duplicate code. This is because candidate 6 and 8 are structurally equivalent but have different signs. This patch changes it so that if you have two IVs that are affine equivalent to just pick one over the other. IV already has code for this, so the patch just uses affine trees instead of tree for the check. With it we get: <Invariant Expressions>: inv_expr 1: stride.3_27 * 4 <Group-candidate Costs>: Group 0: cand cost compl. inv.expr. inv.vars 5 0 2 NIL; NIL; 6 0 3 NIL; NIL; Group 1: cand cost compl. inv.expr. inv.vars 1 0 0 NIL; 6 2 0 0 NIL; 6 3 0 0 NIL; 6 4 0 0 NIL; 6 Initial set of candidates: cost: 16 (complexity 3) reg_cost: 6 cand_cost: 10 cand_group_cost: 0 (complexity 3) candidates: 1, 6 group:0 --> iv_cand:6, cost=(0,3) group:1 --> iv_cand:1, cost=(0,0) invariant variables: 6 invariant expressions: 1 The two patches together results in a 10% performance increase in exchange2 in SPECCPU 2017 and a 4% reduction in binary size and a 5% improvement in compile time. There's also a 5% performance improvement in fotonik3d and similar reduction in binary size. Bootstrapped Regtested on aarch64-none-linux-gnu and no issues. Ok for master? Thanks, Tamar gcc/ChangeLog: PR tree-optimization/114932 * tree-affine.cc (aff_combination_constant_multiple_p): Take zero offsets into account. * tree-ssa-loop-ivopts.cc (affine_compare_eq): New. (record_group_use): Use it. (constant_multiple_of): Also check equality under aff_combination_constant_multiple_p. gcc/testsuite/ChangeLog: PR tree-optimization/114932 * gfortran.dg/addressing-modes_2.f90: New test. --- diff --git a/gcc/testsuite/gfortran.dg/addressing-modes_2.f90 b/gcc/testsuite/gfortran.dg/addressing-modes_2.f90 new file mode 100644 index 0000000000000000000000000000000000000000..8eee4be3dc4d69fecfacd4c2e24a4973c8539fae --- /dev/null +++ b/gcc/testsuite/gfortran.dg/addressing-modes_2.f90 @@ -0,0 +1,20 @@ +! { dg-do compile { target aarch64-*-* } } +! { dg-additional-options "-w -Ofast -fdump-tree-ivopts-all" } + +module a +integer, parameter :: b=3, c=b +contains +subroutine d(block) +integer f, col , block(:, :, :), e +do f = 1, c + do col = 1, c + block(:f, :, e()) = do + end do + end do + end + end + +! { dg-final { scan-tree-dump-not {Selected IV set for loop .+ niters, 3 IVs:} ivopts } } +! { dg-final { scan-tree-dump-times {Selected IV set for loop .+ niters, 2 IVs:} 1 ivopts } } +! { dg-final { scan-tree-dump-times {Selected IV set for loop .+ niters, 1 IVs:} 1 ivopts } } + diff --git a/gcc/tree-affine.cc b/gcc/tree-affine.cc index d6309c4390362b680f0aa97a41fac3281ade66fd..b141bf23c1bbea1001b1bb286346890ddeab4096 100644 --- a/gcc/tree-affine.cc +++ b/gcc/tree-affine.cc @@ -941,6 +941,13 @@ aff_combination_constant_multiple_p (aff_tree *val, aff_tree *div, &mult_set, mult)) return false; + /* Everything is a multiple of 0, which means we shoudn't enforce that + mult_set is checked, since that forced the only valid multiple of + val and div to be 0 whereas 1 is also possible. */ + if (known_eq (val->offset, 0) + && known_eq (div->offset, 0)) + mult_set = false; + for (i = 0; i < div->n; i++) { class aff_comb_elt *elt diff --git a/gcc/tree-ssa-loop-ivopts.cc b/gcc/tree-ssa-loop-ivopts.cc index 7a277aaf18a9e0a32b8ac0d23332b7cd9945ef98..b11bd62a86092ba972a648764cd2facd9ddb4914 100644 --- a/gcc/tree-ssa-loop-ivopts.cc +++ b/gcc/tree-ssa-loop-ivopts.cc @@ -757,6 +757,19 @@ single_dom_exit (class loop *loop) return exit; } +/* Compares the given affine tree LEFT with the tree expression RIGHT and return + whether they are the same under affine equality. */ + +static bool +affine_compare_eq (aff_tree &left, tree right) +{ + aff_tree aff_right; + tree_to_aff_combination (right, TREE_TYPE (right), &aff_right); + aff_combination_scale (&aff_right, -1); + aff_combination_add (&aff_right, &left); + return aff_combination_zero_p (&aff_right); +} + /* Given a nested expression in ARG consisting of PLUS or MULT try to see if one of the arguments of each expression is a constant and that the type of the @@ -1673,6 +1686,9 @@ record_group_use (struct ivopts_data *data, tree *use_p, tree addr_base = NULL; struct iv_group *group = NULL; poly_uint64 addr_offset = 0; + aff_tree iv_step, iv_addr_base; + + tree_to_aff_combination (iv->step, TREE_TYPE (iv->step), &iv_step); /* Record non address type use in a new group. */ if (address_p (type)) @@ -1683,6 +1699,7 @@ record_group_use (struct ivopts_data *data, tree *use_p, tree addr_toffset; split_constant_offset (iv->base, &addr_base, &addr_toffset); addr_offset = int_cst_value (addr_toffset); + tree_to_aff_combination (addr_base, TREE_TYPE (addr_base), &iv_addr_base); for (i = 0; i < data->vgroups.length (); i++) { struct iv_use *use; @@ -1694,8 +1711,8 @@ record_group_use (struct ivopts_data *data, tree *use_p, /* Check if it has the same stripped base and step. */ if (operand_equal_p (iv->base_object, use->iv->base_object, 0) - && operand_equal_p (iv->step, use->iv->step, 0) - && operand_equal_p (addr_base, use->addr_base, 0)) + && affine_compare_eq (iv_step, use->iv->step) + && affine_compare_eq (iv_addr_base, use->addr_base)) break; } if (i == data->vgroups.length ()) @@ -2231,6 +2248,14 @@ constant_multiple_of (tree top, tree bot, widest_int *mul) return true; } + aff_tree aff_top, aff_bot; + tree_to_aff_combination (top, TREE_TYPE (top), &aff_top); + tree_to_aff_combination (bot, TREE_TYPE (bot), &aff_bot); + poly_widest_int poly_mul; + if (aff_combination_constant_multiple_p (&aff_top, &aff_bot, &poly_mul) + && poly_mul.is_constant (mul)) + return true; + code = TREE_CODE (top); switch (code) { --
diff --git a/gcc/testsuite/gfortran.dg/addressing-modes_2.f90 b/gcc/testsuite/gfortran.dg/addressing-modes_2.f90 new file mode 100644 index 0000000000000000000000000000000000000000..8eee4be3dc4d69fecfacd4c2e24a4973c8539fae --- /dev/null +++ b/gcc/testsuite/gfortran.dg/addressing-modes_2.f90 @@ -0,0 +1,20 @@ +! { dg-do compile { target aarch64-*-* } } +! { dg-additional-options "-w -Ofast -fdump-tree-ivopts-all" } + +module a +integer, parameter :: b=3, c=b +contains +subroutine d(block) +integer f, col , block(:, :, :), e +do f = 1, c + do col = 1, c + block(:f, :, e()) = do + end do + end do + end + end + +! { dg-final { scan-tree-dump-not {Selected IV set for loop .+ niters, 3 IVs:} ivopts } } +! { dg-final { scan-tree-dump-times {Selected IV set for loop .+ niters, 2 IVs:} 1 ivopts } } +! { dg-final { scan-tree-dump-times {Selected IV set for loop .+ niters, 1 IVs:} 1 ivopts } } + diff --git a/gcc/tree-affine.cc b/gcc/tree-affine.cc index d6309c4390362b680f0aa97a41fac3281ade66fd..b141bf23c1bbea1001b1bb286346890ddeab4096 100644 --- a/gcc/tree-affine.cc +++ b/gcc/tree-affine.cc @@ -941,6 +941,13 @@ aff_combination_constant_multiple_p (aff_tree *val, aff_tree *div, &mult_set, mult)) return false; + /* Everything is a multiple of 0, which means we shoudn't enforce that + mult_set is checked, since that forced the only valid multiple of + val and div to be 0 whereas 1 is also possible. */ + if (known_eq (val->offset, 0) + && known_eq (div->offset, 0)) + mult_set = false; + for (i = 0; i < div->n; i++) { class aff_comb_elt *elt diff --git a/gcc/tree-ssa-loop-ivopts.cc b/gcc/tree-ssa-loop-ivopts.cc index 7a277aaf18a9e0a32b8ac0d23332b7cd9945ef98..b11bd62a86092ba972a648764cd2facd9ddb4914 100644 --- a/gcc/tree-ssa-loop-ivopts.cc +++ b/gcc/tree-ssa-loop-ivopts.cc @@ -757,6 +757,19 @@ single_dom_exit (class loop *loop) return exit; } +/* Compares the given affine tree LEFT with the tree expression RIGHT and return + whether they are the same under affine equality. */ + +static bool +affine_compare_eq (aff_tree &left, tree right) +{ + aff_tree aff_right; + tree_to_aff_combination (right, TREE_TYPE (right), &aff_right); + aff_combination_scale (&aff_right, -1); + aff_combination_add (&aff_right, &left); + return aff_combination_zero_p (&aff_right); +} + /* Given a nested expression in ARG consisting of PLUS or MULT try to see if one of the arguments of each expression is a constant and that the type of the @@ -1673,6 +1686,9 @@ record_group_use (struct ivopts_data *data, tree *use_p, tree addr_base = NULL; struct iv_group *group = NULL; poly_uint64 addr_offset = 0; + aff_tree iv_step, iv_addr_base; + + tree_to_aff_combination (iv->step, TREE_TYPE (iv->step), &iv_step); /* Record non address type use in a new group. */ if (address_p (type)) @@ -1683,6 +1699,7 @@ record_group_use (struct ivopts_data *data, tree *use_p, tree addr_toffset; split_constant_offset (iv->base, &addr_base, &addr_toffset); addr_offset = int_cst_value (addr_toffset); + tree_to_aff_combination (addr_base, TREE_TYPE (addr_base), &iv_addr_base); for (i = 0; i < data->vgroups.length (); i++) { struct iv_use *use; @@ -1694,8 +1711,8 @@ record_group_use (struct ivopts_data *data, tree *use_p, /* Check if it has the same stripped base and step. */ if (operand_equal_p (iv->base_object, use->iv->base_object, 0) - && operand_equal_p (iv->step, use->iv->step, 0) - && operand_equal_p (addr_base, use->addr_base, 0)) + && affine_compare_eq (iv_step, use->iv->step) + && affine_compare_eq (iv_addr_base, use->addr_base)) break; } if (i == data->vgroups.length ()) @@ -2231,6 +2248,14 @@ constant_multiple_of (tree top, tree bot, widest_int *mul) return true; } + aff_tree aff_top, aff_bot; + tree_to_aff_combination (top, TREE_TYPE (top), &aff_top); + tree_to_aff_combination (bot, TREE_TYPE (bot), &aff_bot); + poly_widest_int poly_mul; + if (aff_combination_constant_multiple_p (&aff_top, &aff_bot, &poly_mul) + && poly_mul.is_constant (mul)) + return true; + code = TREE_CODE (top); switch (code) {