ping
> -----Original Message----- > From: Tamar Christina <tamar.christ...@arm.com> > Sent: Monday, September 23, 2024 8:41 AM > To: gcc-patches@gcc.gnu.org > Cc: nd <n...@arm.com>; rguent...@suse.de; j...@ventanamicro.com > Subject: RE: [PATCH 2/2]middle-end: use two's complement equality when > comparing IVs during candidate selection [PR114932] > > ping > > > -----Original Message----- > > From: Tamar Christina > > Sent: Tuesday, September 10, 2024 8:57 PM > > To: Tamar Christina <tamar.christ...@arm.com>; gcc-patches@gcc.gnu.org > > Cc: nd <n...@arm.com>; rguent...@suse.de; j...@ventanamicro.com > > Subject: RE: [PATCH 2/2]middle-end: use two's complement equality when > > comparing IVs during candidate selection [PR114932] > > > > ping > > > > > -----Original Message----- > > > From: Tamar Christina <tamar.christ...@arm.com> > > > Sent: Tuesday, August 20, 2024 2:06 PM > > > To: gcc-patches@gcc.gnu.org > > > Cc: nd <n...@arm.com>; rguent...@suse.de; j...@ventanamicro.com > > > Subject: [PATCH 2/2]middle-end: use two's complement equality when > > comparing > > > IVs during candidate selection [PR114932] > > > > > > Hi All, > > > > > > IVOPTS normally uses affine trees to perform comparisons between different > IVs, > > > but these seem to have been missing in two key spots and instead normal > > > tree > > > equivalencies used. > > > > > > In some cases where we have a two-complements equivalence but not a strict > > > signedness equivalencies we end up generating both a signed and unsigned > > > IV > for > > > the same candidate. > > > > > > This patch implements a new OEP flag called OEP_STRUCTURAL_EQ. This flag > will > > > check if the operands would produce the same bit values after the > computations > > > even if the final sign is different. > > > > > > This happens quite a lot with fortran but can also happen in C because > > > this > came > > > code is unable to figure out when one expression is a multiple of another. > > > > > > As an example in the attached testcase we get: > > > > > > Initial set of candidates: > > > cost: 24 (complexity 3) > > > reg_cost: 9 > > > cand_cost: 15 > > > cand_group_cost: 0 (complexity 3) > > > candidates: 1, 6, 8 > > > group:0 --> iv_cand:6, cost=(0,1) > > > group:1 --> iv_cand:1, cost=(0,0) > > > group:2 --> iv_cand:8, cost=(0,1) > > > group:3 --> iv_cand:8, cost=(0,1) > > > invariant variables: 6 > > > invariant expressions: 1, 2 > > > > > > <Invariant Expressions>: > > > inv_expr 1: stride.3_27 * 4 > > > inv_expr 2: (unsigned long) stride.3_27 * 4 > > > > > > These end up being used in the same group: > > > > > > Group 1: > > > cand cost compl. inv.expr. inv.vars > > > 1 0 0 NIL; 6 > > > 2 0 0 NIL; 6 > > > 3 0 0 NIL; 6 > > > > > > which ends up with IV opts picking the signed and unsigned IVs: > > > > > > Improved to: > > > cost: 24 (complexity 3) > > > reg_cost: 9 > > > cand_cost: 15 > > > cand_group_cost: 0 (complexity 3) > > > candidates: 1, 6, 8 > > > group:0 --> iv_cand:6, cost=(0,1) > > > group:1 --> iv_cand:1, cost=(0,0) > > > group:2 --> iv_cand:8, cost=(0,1) > > > group:3 --> iv_cand:8, cost=(0,1) > > > invariant variables: 6 > > > invariant expressions: 1, 2 > > > > > > and so generates the same IV as both signed and unsigned: > > > > > > ;; basic block 21, loop depth 3, count 214748368 (estimated locally, > > > freq > > > 58.2545), maybe hot > > > ;; prev block 28, next block 31, flags: (NEW, REACHABLE, VISITED) > > > ;; pred: 28 [always] count:23622320 (estimated locally, freq > > > 6.4080) > > > (FALLTHRU,EXECUTABLE) > > > ;; 25 [always] count:191126046 (estimated locally, freq > > > 51.8465) > > > (FALLTHRU,DFS_BACK,EXECUTABLE) > > > # .MEM_66 = PHI <.MEM_34(28), .MEM_22(25)> > > > # ivtmp.22_41 = PHI <0(28), ivtmp.22_82(25)> > > > # ivtmp.26_51 = PHI <ivtmp.26_55(28), ivtmp.26_72(25)> > > > # ivtmp.28_90 = PHI <ivtmp.28_99(28), ivtmp.28_98(25)> > > > > > > ... > > > > > > ;; basic block 24, loop depth 3, count 214748366 (estimated locally, > > > freq > > > 58.2545), maybe hot > > > ;; prev block 22, next block 25, flags: (NEW, REACHABLE, VISITED)' > > > ;; pred: 22 [always] count:95443719 (estimated locally, freq > > > 25.8909) > > > (FALLTHRU) > > > ;; 21 [33.3% (guessed)] count:71582790 (estimated > > > locally, freq > > 19.4182) > > > (TRUE_VALUE,EXECUTABLE) > > > ;; 31 [33.3% (guessed)] count:47721860 (estimated > > > locally, freq > > 12.9455) > > > (TRUE_VALUE,EXECUTABLE) > > > # .MEM_22 = PHI <.MEM_44(22), .MEM_31(21), .MEM_79(31)> > > > ivtmp.22_82 = ivtmp.22_41 + 1; > > > ivtmp.26_72 = ivtmp.26_51 + _80; > > > ivtmp.28_98 = ivtmp.28_90 + _39; > > > > > > These two IVs are always used as unsigned, so IV ops generates: > > > > > > _73 = stride.3_27 * 4; > > > _80 = (unsigned long) _73; > > > _54 = (unsigned long) stride.3_27; > > > _39 = _54 * 4; > > > > > > Which means that in e.g. exchange2 we generate a lot of duplicate code. > > > > > > This is because candidate 6 and 8 are equivalent under two's complement > > > but > > have > > > different signs. > > > > > > This patch changes it so that if you have two IVs that are affine > > > equivalent to > > > just pick one over the other. IV already has code for this, so the patch > > > just > > > uses affine trees instead of tree for the check. > > > > > > With it we get: > > > > > > <Invariant Expressions>: > > > inv_expr 1: stride.3_27 * 4 > > > > > > <Group-candidate Costs>: > > > Group 0: > > > cand cost compl. inv.expr. inv.vars > > > 5 0 2 NIL; NIL; > > > 6 0 3 NIL; NIL; > > > > > > Group 1: > > > cand cost compl. inv.expr. inv.vars > > > 1 0 0 NIL; 6 > > > 2 0 0 NIL; 6 > > > 3 0 0 NIL; 6 > > > 4 0 0 NIL; 6 > > > > > > Initial set of candidates: > > > cost: 16 (complexity 3) > > > reg_cost: 6 > > > cand_cost: 10 > > > cand_group_cost: 0 (complexity 3) > > > candidates: 1, 6 > > > group:0 --> iv_cand:6, cost=(0,3) > > > group:1 --> iv_cand:1, cost=(0,0) > > > invariant variables: 6 > > > invariant expressions: 1 > > > > > > The two patches together results in a 10% performance increase in > > > exchange2 > in > > > SPECCPU 2017 and a 4% reduction in binary size and a 5% improvement in > > > compile > > > time. There's also a 5% performance improvement in fotonik3d and similar > > > reduction in binary size. > > > > > > Bootstrapped Regtested on aarch64-none-linux-gnu, > > > arm-none-linux-gnueabihf, x86_64-pc-linux-gnu > > > -m32, -m64 and no issues. > > > > > > Ok for master? > > > > > > Thanks, > > > Tamar > > > > > > gcc/ChangeLog: > > > > > > PR tree-optimization/114932 > > > * fold-const.cc (operand_compare::operand_equal_p): Use it. > > > (operand_compare::verify_hash_value): Likewise. > > > (test_operand_equality::test): New. > > > (fold_const_cc_tests): Use it. > > > * tree-core.h (enum operand_equal_flag): Add OEP_STRUCTURAL_EQ. > > > * tree-ssa-loop-ivopts.cc (record_group_use): Check for structural eq. > > > > > > gcc/testsuite/ChangeLog: > > > > > > PR tree-optimization/114932 > > > * gfortran.dg/addressing-modes_2.f90: New test. > > > > > > --- > > > diff --git a/gcc/fold-const.cc b/gcc/fold-const.cc > > > index > > > > > > 71e82b1d76d4106c7c23c54af8b35905a1af9f1c..52a4465de0355220befa6d6b > > > 3301e21e038e3e41 100644 > > > --- a/gcc/fold-const.cc > > > +++ b/gcc/fold-const.cc > > > @@ -3208,7 +3208,19 @@ operand_compare::operand_equal_p (tree type0, > > > const_tree arg0, > > > return tree_int_cst_equal (arg0, arg1); > > > } > > > > > > - if (!(flags & OEP_ADDRESS_OF)) > > > + if ((flags & OEP_STRUCTURAL_EQ) > > > + && (CONVERT_EXPR_P (arg0) || CONVERT_EXPR_P (arg1))) > > > + { > > > + const_tree t_arg0 = arg0; > > > + const_tree t_arg1 = arg1; > > > + STRIP_NOPS (arg0); > > > + STRIP_NOPS (arg1); > > > + /* Only recurse if the conversion was one that was valid to strip. > > > */ > > > + if (t_arg0 != arg0 || t_arg1 != arg1) > > > + return operand_equal_p (type0, arg0, type1, arg1, flags); > > > + } > > > + > > > + if (!(flags & (OEP_ADDRESS_OF | OEP_STRUCTURAL_EQ))) > > > { > > > /* If both types don't have the same signedness, then we can't > > > consider > > > them equal. We must check this before the STRIP_NOPS calls > > > @@ -3439,6 +3451,22 @@ operand_compare::operand_equal_p (tree type0, > > > const_tree arg0, > > > if (flags & (OEP_ONLY_CONST | OEP_BITWISE)) > > > return false; > > > > > > + /* Check if we are checking an operation where the two's compliment > > > bitwise > > > + representation of the result is not the same between signed and > > > + unsigned arithmetic. */ > > > + switch (TREE_CODE (arg0)) > > > + { > > > + case PLUS_EXPR: > > > + case MINUS_EXPR: > > > + case MULT_EXPR: > > > + break; > > > + > > > + default: > > > + /* Clear the structural comparison bit. */ > > > + flags &= ~OEP_STRUCTURAL_EQ; > > > + break; > > > + } > > > + > > > /* Define macros to test an operand from arg0 and arg1 for equality and a > > > variant that allows null and views null as being different from any > > > non-null value. In the latter case, if either is null, the both > > > @@ -4218,7 +4246,7 @@ operand_compare::verify_hash_value (const_tree > > > arg0, const_tree arg1, > > > { > > > if (operand_equal_p (arg0, arg1, flags | OEP_NO_HASH_CHECK)) > > > { > > > - if (arg0 != arg1 && !(flags & OEP_DECL_NAME)) > > > + if (arg0 != arg1 && !(flags & (OEP_DECL_NAME | > > > OEP_STRUCTURAL_EQ))) > > > { > > > inchash::hash hstate0 (0), hstate1 (0); > > > hash_operand (arg0, hstate0, flags | OEP_HASH_CHECK); > > > @@ -17357,6 +17385,95 @@ test_arithmetic_folding () > > > x); > > > } > > > > > > +namespace test_operand_equality { > > > + > > > +/* Verify structural equality. */ > > > + > > > +/* Execute fold_vec_perm_cst unit tests. */ > > > + > > > +static void > > > +test () > > > +{ > > > + tree stype = integer_type_node; > > > + tree utype = unsigned_type_node; > > > + tree x = create_tmp_var_raw (stype, "x"); > > > + tree y = create_tmp_var_raw (stype, "y"); > > > + tree z = create_tmp_var_raw (stype, "z"); > > > + tree four = build_int_cst (stype, 4); > > > + tree lhs1 = fold_build2 (PLUS_EXPR, stype, x, y); > > > + tree rhs1 = fold_convert (stype, > > > + fold_build2 (PLUS_EXPR, utype, > > > + fold_convert (utype, x), > > > + fold_convert (utype, y))); > > > + > > > + /* (int)((unsigned x) + (unsigned y)) == x + y. */ > > > + ASSERT_TRUE (operand_equal_p (lhs1, rhs1, OEP_STRUCTURAL_EQ)); > > > + ASSERT_FALSE (operand_equal_p (lhs1, rhs1, 0)); > > > + > > > + /* (int)(unsigned) x == x. */ > > > + tree lhs2 = build1 (NOP_EXPR, stype, > > > + build1 (NOP_EXPR, utype, x)); > > > + tree rhs2 = x; > > > + ASSERT_TRUE (operand_equal_p (lhs2, rhs2, OEP_STRUCTURAL_EQ)); > > > + ASSERT_TRUE (operand_equal_p (lhs2, rhs2, 0)); > > > + > > > + /* (unsigned x) + (unsigned y) == x + y. */ > > > + tree lhs3 = lhs1; > > > + tree rhs3 = fold_build2 (PLUS_EXPR, utype, > > > + fold_convert (utype, x), > > > + fold_convert (utype, y)); > > > + ASSERT_TRUE (operand_equal_p (lhs3, rhs3, OEP_STRUCTURAL_EQ)); > > > + ASSERT_FALSE (operand_equal_p (lhs3, rhs3, 0)); > > > + > > > + /* (unsigned x) / (unsigned y) == x / y. */ > > > + tree lhs4 = fold_build2 (TRUNC_DIV_EXPR, stype, x, y);; > > > + tree rhs4 = fold_build2 (TRUNC_DIV_EXPR, utype, > > > + fold_convert (utype, x), > > > + fold_convert (utype, y)); > > > + ASSERT_FALSE (operand_equal_p (lhs4, rhs4, OEP_STRUCTURAL_EQ)); > > > + ASSERT_FALSE (operand_equal_p (lhs4, rhs4, 0)); > > > + > > > + /* (unsigned x) / 4 == x / 4. */ > > > + tree lhs5 = fold_build2 (TRUNC_DIV_EXPR, stype, x, four);; > > > + tree rhs5 = fold_build2 (TRUNC_DIV_EXPR, utype, > > > + fold_convert (utype, x), > > > + fold_convert (utype, four)); > > > + ASSERT_FALSE (operand_equal_p (lhs5, rhs5, OEP_STRUCTURAL_EQ)); > > > + ASSERT_FALSE (operand_equal_p (lhs5, rhs5, 0)); > > > + > > > + /* (unsigned x) + 4 == x + 4. */ > > > + tree lhs6 = fold_build2 (PLUS_EXPR, stype, x, four); > > > + tree rhs6 = fold_build2 (PLUS_EXPR, utype, > > > + fold_convert (utype, x), > > > + fold_convert (utype, four)); > > > + ASSERT_TRUE (operand_equal_p (lhs6, rhs6, OEP_STRUCTURAL_EQ)); > > > + ASSERT_FALSE (operand_equal_p (lhs6, rhs6, 0)); > > > + > > > + /* (unsigned x) + 4 == 4 + x. */ > > > + tree lhs7 = fold_build2 (PLUS_EXPR, stype, four, x); > > > + tree rhs7 = fold_build2 (PLUS_EXPR, utype, > > > + fold_convert (utype, x), > > > + fold_convert (utype, four)); > > > + ASSERT_TRUE (operand_equal_p (lhs7, rhs7, OEP_STRUCTURAL_EQ)); > > > + ASSERT_FALSE (operand_equal_p (lhs7, rhs7, 0)); > > > + > > > + /* ((unsigned x) + 4) * (unsigned y)) + z == ((4 + x) * y) + z. */ > > > + tree lhs8 = fold_build2 (PLUS_EXPR, stype, > > > + fold_build2 (MULT_EXPR, stype, > > > + fold_build2 (PLUS_EXPR, stype, four, x), > > > + y), > > > + z); > > > + tree rhs8 = fold_build2 (MULT_EXPR, utype, > > > + fold_build2 (PLUS_EXPR, utype, > > > + fold_convert (utype, x), > > > + fold_convert (utype, four)), > > > + fold_convert (utype, y)); > > > + rhs8 = fold_build2 (PLUS_EXPR, stype, fold_convert (stype, rhs8), z); > > > + ASSERT_TRUE (operand_equal_p (lhs8, rhs8, OEP_STRUCTURAL_EQ)); > > > + ASSERT_FALSE (operand_equal_p (lhs8, rhs8, 0)); > > > +} > > > +} > > > + > > > namespace test_fold_vec_perm_cst { > > > > > > /* Build a VECTOR_CST corresponding to VMODE, and has > > > @@ -18150,6 +18267,7 @@ fold_const_cc_tests () > > > test_vector_folding (); > > > test_vec_duplicate_folding (); > > > test_fold_vec_perm_cst::test (); > > > + test_operand_equality::test (); > > > } > > > > > > } // namespace selftest > > > diff --git a/gcc/testsuite/gfortran.dg/addressing-modes_2.f90 > > > b/gcc/testsuite/gfortran.dg/addressing-modes_2.f90 > > > new file mode 100644 > > > index > > > > > > 0000000000000000000000000000000000000000..8eee4be3dc4d69fecfacd4c > > > 2e24a4973c8539fae > > > --- /dev/null > > > +++ b/gcc/testsuite/gfortran.dg/addressing-modes_2.f90 > > > @@ -0,0 +1,20 @@ > > > +! { dg-do compile { target aarch64-*-* } } > > > +! { dg-additional-options "-w -Ofast -fdump-tree-ivopts-all" } > > > + > > > +module a > > > +integer, parameter :: b=3, c=b > > > +contains > > > +subroutine d(block) > > > +integer f, col , block(:, :, :), e > > > +do f = 1, c > > > + do col = 1, c > > > + block(:f, :, e()) = do > > > + end do > > > + end do > > > + end > > > + end > > > + > > > +! { dg-final { scan-tree-dump-not {Selected IV set for loop .+ niters, 3 > > > IVs:} > ivopts > > } > > > } > > > +! { dg-final { scan-tree-dump-times {Selected IV set for loop .+ niters, > > > 2 IVs:} 1 > > > ivopts } } > > > +! { dg-final { scan-tree-dump-times {Selected IV set for loop .+ niters, > > > 1 IVs:} 1 > > > ivopts } } > > > + > > > diff --git a/gcc/tree-core.h b/gcc/tree-core.h > > > index > > > > > > 27c569c77022227643151fa4a909a95f3d45bf55..fe398d99bcaa9e8fcec45e5e0 > > > 7d133bc393aa4a7 100644 > > > --- a/gcc/tree-core.h > > > +++ b/gcc/tree-core.h > > > @@ -962,7 +962,9 @@ enum operand_equal_flag { > > > /* In conjunction with OEP_LEXICOGRAPHIC considers names of > > > declarations > > > of the same kind. Used to compare VLA bounds involving parameters > > > across redeclarations of the same function. */ > > > - OEP_DECL_NAME = 512 > > > + OEP_DECL_NAME = 512, > > > + /* Check for structural equivalence and ignore NOP_CONVERT casts. */ > > > + OEP_STRUCTURAL_EQ = 1024 > > > }; > > > > > > /* Enum and arrays used for tree allocation stats. > > > diff --git a/gcc/tree-ssa-loop-ivopts.cc b/gcc/tree-ssa-loop-ivopts.cc > > > index > > > > > > c3218a3e8eedbb8d0a7f14c01eeb069cb6024c29..b05e4ac1e63b5c110707a8a3 > > > ed5e614b7ccc702f 100644 > > > --- a/gcc/tree-ssa-loop-ivopts.cc > > > +++ b/gcc/tree-ssa-loop-ivopts.cc > > > @@ -1623,8 +1623,8 @@ record_group_use (struct ivopts_data *data, tree > > > *use_p, > > > > > > /* Check if it has the same stripped base and step. */ > > > if (operand_equal_p (iv->base_object, use->iv->base_object, 0) > > > - && operand_equal_p (iv->step, use->iv->step, 0) > > > - && operand_equal_p (addr_base, use->addr_base, 0)) > > > + && operand_equal_p (iv->step, use->iv->step, OEP_STRUCTURAL_EQ) > > > + && operand_equal_p (addr_base, use->addr_base, > > > OEP_STRUCTURAL_EQ)) > > > break; > > > } > > > if (i == data->vgroups.length ()) > > > > > > > > > > > > > > > --