ping
> -----Original Message----- > From: Tamar Christina > Sent: Tuesday, September 10, 2024 8:57 PM > To: Tamar Christina <tamar.christ...@arm.com>; gcc-patches@gcc.gnu.org > Cc: nd <n...@arm.com>; rguent...@suse.de; j...@ventanamicro.com > Subject: RE: [PATCH 2/2]middle-end: use two's complement equality when > comparing IVs during candidate selection [PR114932] > > ping > > > -----Original Message----- > > From: Tamar Christina <tamar.christ...@arm.com> > > Sent: Tuesday, August 20, 2024 2:06 PM > > To: gcc-patches@gcc.gnu.org > > Cc: nd <n...@arm.com>; rguent...@suse.de; j...@ventanamicro.com > > Subject: [PATCH 2/2]middle-end: use two's complement equality when > comparing > > IVs during candidate selection [PR114932] > > > > Hi All, > > > > IVOPTS normally uses affine trees to perform comparisons between different > > IVs, > > but these seem to have been missing in two key spots and instead normal tree > > equivalencies used. > > > > In some cases where we have a two-complements equivalence but not a strict > > signedness equivalencies we end up generating both a signed and unsigned IV > > for > > the same candidate. > > > > This patch implements a new OEP flag called OEP_STRUCTURAL_EQ. This flag > > will > > check if the operands would produce the same bit values after the > > computations > > even if the final sign is different. > > > > This happens quite a lot with fortran but can also happen in C because this > > came > > code is unable to figure out when one expression is a multiple of another. > > > > As an example in the attached testcase we get: > > > > Initial set of candidates: > > cost: 24 (complexity 3) > > reg_cost: 9 > > cand_cost: 15 > > cand_group_cost: 0 (complexity 3) > > candidates: 1, 6, 8 > > group:0 --> iv_cand:6, cost=(0,1) > > group:1 --> iv_cand:1, cost=(0,0) > > group:2 --> iv_cand:8, cost=(0,1) > > group:3 --> iv_cand:8, cost=(0,1) > > invariant variables: 6 > > invariant expressions: 1, 2 > > > > <Invariant Expressions>: > > inv_expr 1: stride.3_27 * 4 > > inv_expr 2: (unsigned long) stride.3_27 * 4 > > > > These end up being used in the same group: > > > > Group 1: > > cand cost compl. inv.expr. inv.vars > > 1 0 0 NIL; 6 > > 2 0 0 NIL; 6 > > 3 0 0 NIL; 6 > > > > which ends up with IV opts picking the signed and unsigned IVs: > > > > Improved to: > > cost: 24 (complexity 3) > > reg_cost: 9 > > cand_cost: 15 > > cand_group_cost: 0 (complexity 3) > > candidates: 1, 6, 8 > > group:0 --> iv_cand:6, cost=(0,1) > > group:1 --> iv_cand:1, cost=(0,0) > > group:2 --> iv_cand:8, cost=(0,1) > > group:3 --> iv_cand:8, cost=(0,1) > > invariant variables: 6 > > invariant expressions: 1, 2 > > > > and so generates the same IV as both signed and unsigned: > > > > ;; basic block 21, loop depth 3, count 214748368 (estimated locally, freq > > 58.2545), maybe hot > > ;; prev block 28, next block 31, flags: (NEW, REACHABLE, VISITED) > > ;; pred: 28 [always] count:23622320 (estimated locally, freq > > 6.4080) > > (FALLTHRU,EXECUTABLE) > > ;; 25 [always] count:191126046 (estimated locally, freq > > 51.8465) > > (FALLTHRU,DFS_BACK,EXECUTABLE) > > # .MEM_66 = PHI <.MEM_34(28), .MEM_22(25)> > > # ivtmp.22_41 = PHI <0(28), ivtmp.22_82(25)> > > # ivtmp.26_51 = PHI <ivtmp.26_55(28), ivtmp.26_72(25)> > > # ivtmp.28_90 = PHI <ivtmp.28_99(28), ivtmp.28_98(25)> > > > > ... > > > > ;; basic block 24, loop depth 3, count 214748366 (estimated locally, freq > > 58.2545), maybe hot > > ;; prev block 22, next block 25, flags: (NEW, REACHABLE, VISITED)' > > ;; pred: 22 [always] count:95443719 (estimated locally, freq > > 25.8909) > > (FALLTHRU) > > ;; 21 [33.3% (guessed)] count:71582790 (estimated locally, > > freq > 19.4182) > > (TRUE_VALUE,EXECUTABLE) > > ;; 31 [33.3% (guessed)] count:47721860 (estimated locally, > > freq > 12.9455) > > (TRUE_VALUE,EXECUTABLE) > > # .MEM_22 = PHI <.MEM_44(22), .MEM_31(21), .MEM_79(31)> > > ivtmp.22_82 = ivtmp.22_41 + 1; > > ivtmp.26_72 = ivtmp.26_51 + _80; > > ivtmp.28_98 = ivtmp.28_90 + _39; > > > > These two IVs are always used as unsigned, so IV ops generates: > > > > _73 = stride.3_27 * 4; > > _80 = (unsigned long) _73; > > _54 = (unsigned long) stride.3_27; > > _39 = _54 * 4; > > > > Which means that in e.g. exchange2 we generate a lot of duplicate code. > > > > This is because candidate 6 and 8 are equivalent under two's complement but > have > > different signs. > > > > This patch changes it so that if you have two IVs that are affine > > equivalent to > > just pick one over the other. IV already has code for this, so the patch > > just > > uses affine trees instead of tree for the check. > > > > With it we get: > > > > <Invariant Expressions>: > > inv_expr 1: stride.3_27 * 4 > > > > <Group-candidate Costs>: > > Group 0: > > cand cost compl. inv.expr. inv.vars > > 5 0 2 NIL; NIL; > > 6 0 3 NIL; NIL; > > > > Group 1: > > cand cost compl. inv.expr. inv.vars > > 1 0 0 NIL; 6 > > 2 0 0 NIL; 6 > > 3 0 0 NIL; 6 > > 4 0 0 NIL; 6 > > > > Initial set of candidates: > > cost: 16 (complexity 3) > > reg_cost: 6 > > cand_cost: 10 > > cand_group_cost: 0 (complexity 3) > > candidates: 1, 6 > > group:0 --> iv_cand:6, cost=(0,3) > > group:1 --> iv_cand:1, cost=(0,0) > > invariant variables: 6 > > invariant expressions: 1 > > > > The two patches together results in a 10% performance increase in exchange2 > > in > > SPECCPU 2017 and a 4% reduction in binary size and a 5% improvement in > > compile > > time. There's also a 5% performance improvement in fotonik3d and similar > > reduction in binary size. > > > > Bootstrapped Regtested on aarch64-none-linux-gnu, > > arm-none-linux-gnueabihf, x86_64-pc-linux-gnu > > -m32, -m64 and no issues. > > > > Ok for master? > > > > Thanks, > > Tamar > > > > gcc/ChangeLog: > > > > PR tree-optimization/114932 > > * fold-const.cc (operand_compare::operand_equal_p): Use it. > > (operand_compare::verify_hash_value): Likewise. > > (test_operand_equality::test): New. > > (fold_const_cc_tests): Use it. > > * tree-core.h (enum operand_equal_flag): Add OEP_STRUCTURAL_EQ. > > * tree-ssa-loop-ivopts.cc (record_group_use): Check for structural eq. > > > > gcc/testsuite/ChangeLog: > > > > PR tree-optimization/114932 > > * gfortran.dg/addressing-modes_2.f90: New test. > > > > --- > > diff --git a/gcc/fold-const.cc b/gcc/fold-const.cc > > index > > > 71e82b1d76d4106c7c23c54af8b35905a1af9f1c..52a4465de0355220befa6d6b > > 3301e21e038e3e41 100644 > > --- a/gcc/fold-const.cc > > +++ b/gcc/fold-const.cc > > @@ -3208,7 +3208,19 @@ operand_compare::operand_equal_p (tree type0, > > const_tree arg0, > > return tree_int_cst_equal (arg0, arg1); > > } > > > > - if (!(flags & OEP_ADDRESS_OF)) > > + if ((flags & OEP_STRUCTURAL_EQ) > > + && (CONVERT_EXPR_P (arg0) || CONVERT_EXPR_P (arg1))) > > + { > > + const_tree t_arg0 = arg0; > > + const_tree t_arg1 = arg1; > > + STRIP_NOPS (arg0); > > + STRIP_NOPS (arg1); > > + /* Only recurse if the conversion was one that was valid to strip. > > */ > > + if (t_arg0 != arg0 || t_arg1 != arg1) > > + return operand_equal_p (type0, arg0, type1, arg1, flags); > > + } > > + > > + if (!(flags & (OEP_ADDRESS_OF | OEP_STRUCTURAL_EQ))) > > { > > /* If both types don't have the same signedness, then we can't > > consider > > them equal. We must check this before the STRIP_NOPS calls > > @@ -3439,6 +3451,22 @@ operand_compare::operand_equal_p (tree type0, > > const_tree arg0, > > if (flags & (OEP_ONLY_CONST | OEP_BITWISE)) > > return false; > > > > + /* Check if we are checking an operation where the two's compliment > > bitwise > > + representation of the result is not the same between signed and > > + unsigned arithmetic. */ > > + switch (TREE_CODE (arg0)) > > + { > > + case PLUS_EXPR: > > + case MINUS_EXPR: > > + case MULT_EXPR: > > + break; > > + > > + default: > > + /* Clear the structural comparison bit. */ > > + flags &= ~OEP_STRUCTURAL_EQ; > > + break; > > + } > > + > > /* Define macros to test an operand from arg0 and arg1 for equality and a > > variant that allows null and views null as being different from any > > non-null value. In the latter case, if either is null, the both > > @@ -4218,7 +4246,7 @@ operand_compare::verify_hash_value (const_tree > > arg0, const_tree arg1, > > { > > if (operand_equal_p (arg0, arg1, flags | OEP_NO_HASH_CHECK)) > > { > > - if (arg0 != arg1 && !(flags & OEP_DECL_NAME)) > > + if (arg0 != arg1 && !(flags & (OEP_DECL_NAME | > > OEP_STRUCTURAL_EQ))) > > { > > inchash::hash hstate0 (0), hstate1 (0); > > hash_operand (arg0, hstate0, flags | OEP_HASH_CHECK); > > @@ -17357,6 +17385,95 @@ test_arithmetic_folding () > > x); > > } > > > > +namespace test_operand_equality { > > + > > +/* Verify structural equality. */ > > + > > +/* Execute fold_vec_perm_cst unit tests. */ > > + > > +static void > > +test () > > +{ > > + tree stype = integer_type_node; > > + tree utype = unsigned_type_node; > > + tree x = create_tmp_var_raw (stype, "x"); > > + tree y = create_tmp_var_raw (stype, "y"); > > + tree z = create_tmp_var_raw (stype, "z"); > > + tree four = build_int_cst (stype, 4); > > + tree lhs1 = fold_build2 (PLUS_EXPR, stype, x, y); > > + tree rhs1 = fold_convert (stype, > > + fold_build2 (PLUS_EXPR, utype, > > + fold_convert (utype, x), > > + fold_convert (utype, y))); > > + > > + /* (int)((unsigned x) + (unsigned y)) == x + y. */ > > + ASSERT_TRUE (operand_equal_p (lhs1, rhs1, OEP_STRUCTURAL_EQ)); > > + ASSERT_FALSE (operand_equal_p (lhs1, rhs1, 0)); > > + > > + /* (int)(unsigned) x == x. */ > > + tree lhs2 = build1 (NOP_EXPR, stype, > > + build1 (NOP_EXPR, utype, x)); > > + tree rhs2 = x; > > + ASSERT_TRUE (operand_equal_p (lhs2, rhs2, OEP_STRUCTURAL_EQ)); > > + ASSERT_TRUE (operand_equal_p (lhs2, rhs2, 0)); > > + > > + /* (unsigned x) + (unsigned y) == x + y. */ > > + tree lhs3 = lhs1; > > + tree rhs3 = fold_build2 (PLUS_EXPR, utype, > > + fold_convert (utype, x), > > + fold_convert (utype, y)); > > + ASSERT_TRUE (operand_equal_p (lhs3, rhs3, OEP_STRUCTURAL_EQ)); > > + ASSERT_FALSE (operand_equal_p (lhs3, rhs3, 0)); > > + > > + /* (unsigned x) / (unsigned y) == x / y. */ > > + tree lhs4 = fold_build2 (TRUNC_DIV_EXPR, stype, x, y);; > > + tree rhs4 = fold_build2 (TRUNC_DIV_EXPR, utype, > > + fold_convert (utype, x), > > + fold_convert (utype, y)); > > + ASSERT_FALSE (operand_equal_p (lhs4, rhs4, OEP_STRUCTURAL_EQ)); > > + ASSERT_FALSE (operand_equal_p (lhs4, rhs4, 0)); > > + > > + /* (unsigned x) / 4 == x / 4. */ > > + tree lhs5 = fold_build2 (TRUNC_DIV_EXPR, stype, x, four);; > > + tree rhs5 = fold_build2 (TRUNC_DIV_EXPR, utype, > > + fold_convert (utype, x), > > + fold_convert (utype, four)); > > + ASSERT_FALSE (operand_equal_p (lhs5, rhs5, OEP_STRUCTURAL_EQ)); > > + ASSERT_FALSE (operand_equal_p (lhs5, rhs5, 0)); > > + > > + /* (unsigned x) + 4 == x + 4. */ > > + tree lhs6 = fold_build2 (PLUS_EXPR, stype, x, four); > > + tree rhs6 = fold_build2 (PLUS_EXPR, utype, > > + fold_convert (utype, x), > > + fold_convert (utype, four)); > > + ASSERT_TRUE (operand_equal_p (lhs6, rhs6, OEP_STRUCTURAL_EQ)); > > + ASSERT_FALSE (operand_equal_p (lhs6, rhs6, 0)); > > + > > + /* (unsigned x) + 4 == 4 + x. */ > > + tree lhs7 = fold_build2 (PLUS_EXPR, stype, four, x); > > + tree rhs7 = fold_build2 (PLUS_EXPR, utype, > > + fold_convert (utype, x), > > + fold_convert (utype, four)); > > + ASSERT_TRUE (operand_equal_p (lhs7, rhs7, OEP_STRUCTURAL_EQ)); > > + ASSERT_FALSE (operand_equal_p (lhs7, rhs7, 0)); > > + > > + /* ((unsigned x) + 4) * (unsigned y)) + z == ((4 + x) * y) + z. */ > > + tree lhs8 = fold_build2 (PLUS_EXPR, stype, > > + fold_build2 (MULT_EXPR, stype, > > + fold_build2 (PLUS_EXPR, stype, four, x), > > + y), > > + z); > > + tree rhs8 = fold_build2 (MULT_EXPR, utype, > > + fold_build2 (PLUS_EXPR, utype, > > + fold_convert (utype, x), > > + fold_convert (utype, four)), > > + fold_convert (utype, y)); > > + rhs8 = fold_build2 (PLUS_EXPR, stype, fold_convert (stype, rhs8), z); > > + ASSERT_TRUE (operand_equal_p (lhs8, rhs8, OEP_STRUCTURAL_EQ)); > > + ASSERT_FALSE (operand_equal_p (lhs8, rhs8, 0)); > > +} > > +} > > + > > namespace test_fold_vec_perm_cst { > > > > /* Build a VECTOR_CST corresponding to VMODE, and has > > @@ -18150,6 +18267,7 @@ fold_const_cc_tests () > > test_vector_folding (); > > test_vec_duplicate_folding (); > > test_fold_vec_perm_cst::test (); > > + test_operand_equality::test (); > > } > > > > } // namespace selftest > > diff --git a/gcc/testsuite/gfortran.dg/addressing-modes_2.f90 > > b/gcc/testsuite/gfortran.dg/addressing-modes_2.f90 > > new file mode 100644 > > index > > > 0000000000000000000000000000000000000000..8eee4be3dc4d69fecfacd4c > > 2e24a4973c8539fae > > --- /dev/null > > +++ b/gcc/testsuite/gfortran.dg/addressing-modes_2.f90 > > @@ -0,0 +1,20 @@ > > +! { dg-do compile { target aarch64-*-* } } > > +! { dg-additional-options "-w -Ofast -fdump-tree-ivopts-all" } > > + > > +module a > > +integer, parameter :: b=3, c=b > > +contains > > +subroutine d(block) > > +integer f, col , block(:, :, :), e > > +do f = 1, c > > + do col = 1, c > > + block(:f, :, e()) = do > > + end do > > + end do > > + end > > + end > > + > > +! { dg-final { scan-tree-dump-not {Selected IV set for loop .+ niters, 3 > > IVs:} ivopts > } > > } > > +! { dg-final { scan-tree-dump-times {Selected IV set for loop .+ niters, 2 > > IVs:} 1 > > ivopts } } > > +! { dg-final { scan-tree-dump-times {Selected IV set for loop .+ niters, 1 > > IVs:} 1 > > ivopts } } > > + > > diff --git a/gcc/tree-core.h b/gcc/tree-core.h > > index > > > 27c569c77022227643151fa4a909a95f3d45bf55..fe398d99bcaa9e8fcec45e5e0 > > 7d133bc393aa4a7 100644 > > --- a/gcc/tree-core.h > > +++ b/gcc/tree-core.h > > @@ -962,7 +962,9 @@ enum operand_equal_flag { > > /* In conjunction with OEP_LEXICOGRAPHIC considers names of declarations > > of the same kind. Used to compare VLA bounds involving parameters > > across redeclarations of the same function. */ > > - OEP_DECL_NAME = 512 > > + OEP_DECL_NAME = 512, > > + /* Check for structural equivalence and ignore NOP_CONVERT casts. */ > > + OEP_STRUCTURAL_EQ = 1024 > > }; > > > > /* Enum and arrays used for tree allocation stats. > > diff --git a/gcc/tree-ssa-loop-ivopts.cc b/gcc/tree-ssa-loop-ivopts.cc > > index > > > c3218a3e8eedbb8d0a7f14c01eeb069cb6024c29..b05e4ac1e63b5c110707a8a3 > > ed5e614b7ccc702f 100644 > > --- a/gcc/tree-ssa-loop-ivopts.cc > > +++ b/gcc/tree-ssa-loop-ivopts.cc > > @@ -1623,8 +1623,8 @@ record_group_use (struct ivopts_data *data, tree > > *use_p, > > > > /* Check if it has the same stripped base and step. */ > > if (operand_equal_p (iv->base_object, use->iv->base_object, 0) > > - && operand_equal_p (iv->step, use->iv->step, 0) > > - && operand_equal_p (addr_base, use->addr_base, 0)) > > + && operand_equal_p (iv->step, use->iv->step, OEP_STRUCTURAL_EQ) > > + && operand_equal_p (addr_base, use->addr_base, > > OEP_STRUCTURAL_EQ)) > > break; > > } > > if (i == data->vgroups.length ()) > > > > > > > > > > --