ping

> -----Original Message-----
> From: Tamar Christina
> Sent: Tuesday, September 10, 2024 8:57 PM
> To: Tamar Christina <tamar.christ...@arm.com>; gcc-patches@gcc.gnu.org
> Cc: nd <n...@arm.com>; rguent...@suse.de; j...@ventanamicro.com
> Subject: RE: [PATCH 2/2]middle-end: use two's complement equality when
> comparing IVs during candidate selection [PR114932]
> 
> ping
> 
> > -----Original Message-----
> > From: Tamar Christina <tamar.christ...@arm.com>
> > Sent: Tuesday, August 20, 2024 2:06 PM
> > To: gcc-patches@gcc.gnu.org
> > Cc: nd <n...@arm.com>; rguent...@suse.de; j...@ventanamicro.com
> > Subject: [PATCH 2/2]middle-end: use two's complement equality when
> comparing
> > IVs during candidate selection [PR114932]
> >
> > Hi All,
> >
> > IVOPTS normally uses affine trees to perform comparisons between different 
> > IVs,
> > but these seem to have been missing in two key spots and instead normal tree
> > equivalencies used.
> >
> > In some cases where we have a two-complements equivalence but not a strict
> > signedness equivalencies we end up generating both a signed and unsigned IV 
> > for
> > the same candidate.
> >
> > This patch implements a new OEP flag called OEP_STRUCTURAL_EQ.  This flag 
> > will
> > check if the operands would produce the same bit values after the 
> > computations
> > even if the final sign is different.
> >
> > This happens quite a lot with fortran but can also happen in C because this 
> > came
> > code is unable to figure out when one expression is a multiple of another.
> >
> > As an example in the attached testcase we get:
> >
> > Initial set of candidates:
> >   cost: 24 (complexity 3)
> >   reg_cost: 9
> >   cand_cost: 15
> >   cand_group_cost: 0 (complexity 3)
> >   candidates: 1, 6, 8
> >    group:0 --> iv_cand:6, cost=(0,1)
> >    group:1 --> iv_cand:1, cost=(0,0)
> >    group:2 --> iv_cand:8, cost=(0,1)
> >    group:3 --> iv_cand:8, cost=(0,1)
> >   invariant variables: 6
> >   invariant expressions: 1, 2
> >
> > <Invariant Expressions>:
> > inv_expr 1:     stride.3_27 * 4
> > inv_expr 2:     (unsigned long) stride.3_27 * 4
> >
> > These end up being used in the same group:
> >
> > Group 1:
> > cand  cost    compl.  inv.expr.       inv.vars
> > 1     0       0       NIL;    6
> > 2     0       0       NIL;    6
> > 3     0       0       NIL;    6
> >
> > which ends up with IV opts picking the signed and unsigned IVs:
> >
> > Improved to:
> >   cost: 24 (complexity 3)
> >   reg_cost: 9
> >   cand_cost: 15
> >   cand_group_cost: 0 (complexity 3)
> >   candidates: 1, 6, 8
> >    group:0 --> iv_cand:6, cost=(0,1)
> >    group:1 --> iv_cand:1, cost=(0,0)
> >    group:2 --> iv_cand:8, cost=(0,1)
> >    group:3 --> iv_cand:8, cost=(0,1)
> >   invariant variables: 6
> >   invariant expressions: 1, 2
> >
> > and so generates the same IV as both signed and unsigned:
> >
> > ;;   basic block 21, loop depth 3, count 214748368 (estimated locally, freq
> > 58.2545), maybe hot
> > ;;    prev block 28, next block 31, flags: (NEW, REACHABLE, VISITED)
> > ;;    pred:       28 [always]  count:23622320 (estimated locally, freq 
> > 6.4080)
> > (FALLTHRU,EXECUTABLE)
> > ;;                25 [always]  count:191126046 (estimated locally, freq 
> > 51.8465)
> > (FALLTHRU,DFS_BACK,EXECUTABLE)
> >   # .MEM_66 = PHI <.MEM_34(28), .MEM_22(25)>
> >   # ivtmp.22_41 = PHI <0(28), ivtmp.22_82(25)>
> >   # ivtmp.26_51 = PHI <ivtmp.26_55(28), ivtmp.26_72(25)>
> >   # ivtmp.28_90 = PHI <ivtmp.28_99(28), ivtmp.28_98(25)>
> >
> > ...
> >
> > ;;   basic block 24, loop depth 3, count 214748366 (estimated locally, freq
> > 58.2545), maybe hot
> > ;;    prev block 22, next block 25, flags: (NEW, REACHABLE, VISITED)'
> > ;;    pred:       22 [always]  count:95443719 (estimated locally, freq 
> > 25.8909)
> > (FALLTHRU)
> > ;;                21 [33.3% (guessed)]  count:71582790 (estimated locally, 
> > freq
> 19.4182)
> > (TRUE_VALUE,EXECUTABLE)
> > ;;                31 [33.3% (guessed)]  count:47721860 (estimated locally, 
> > freq
> 12.9455)
> > (TRUE_VALUE,EXECUTABLE)
> > # .MEM_22 = PHI <.MEM_44(22), .MEM_31(21), .MEM_79(31)>
> > ivtmp.22_82 = ivtmp.22_41 + 1;
> > ivtmp.26_72 = ivtmp.26_51 + _80;
> > ivtmp.28_98 = ivtmp.28_90 + _39;
> >
> > These two IVs are always used as unsigned, so IV ops generates:
> >
> >   _73 = stride.3_27 * 4;
> >   _80 = (unsigned long) _73;
> >   _54 = (unsigned long) stride.3_27;
> >   _39 = _54 * 4;
> >
> > Which means that in e.g. exchange2 we generate a lot of duplicate code.
> >
> > This is because candidate 6 and 8 are equivalent under two's complement but
> have
> > different signs.
> >
> > This patch changes it so that if you have two IVs that are affine 
> > equivalent to
> > just pick one over the other.  IV already has code for this, so the patch 
> > just
> > uses affine trees instead of tree for the check.
> >
> > With it we get:
> >
> > <Invariant Expressions>:
> > inv_expr 1:     stride.3_27 * 4
> >
> > <Group-candidate Costs>:
> > Group 0:
> >   cand  cost    compl.  inv.expr.       inv.vars
> >   5     0       2       NIL;    NIL;
> >   6     0       3       NIL;    NIL;
> >
> > Group 1:
> >   cand  cost    compl.  inv.expr.       inv.vars
> >   1     0       0       NIL;    6
> >   2     0       0       NIL;    6
> >   3     0       0       NIL;    6
> >   4     0       0       NIL;    6
> >
> > Initial set of candidates:
> >   cost: 16 (complexity 3)
> >   reg_cost: 6
> >   cand_cost: 10
> >   cand_group_cost: 0 (complexity 3)
> >   candidates: 1, 6
> >    group:0 --> iv_cand:6, cost=(0,3)
> >    group:1 --> iv_cand:1, cost=(0,0)
> >   invariant variables: 6
> >   invariant expressions: 1
> >
> > The two patches together results in a 10% performance increase in exchange2 
> > in
> > SPECCPU 2017 and a 4% reduction in binary size and a 5% improvement in
> > compile
> > time. There's also a 5% performance improvement in fotonik3d and similar
> > reduction in binary size.
> >
> > Bootstrapped Regtested on aarch64-none-linux-gnu,
> > arm-none-linux-gnueabihf, x86_64-pc-linux-gnu
> > -m32, -m64 and no issues.
> >
> > Ok for master?
> >
> > Thanks,
> > Tamar
> >
> > gcc/ChangeLog:
> >
> >     PR tree-optimization/114932
> >     * fold-const.cc (operand_compare::operand_equal_p): Use it.
> >     (operand_compare::verify_hash_value): Likewise.
> >     (test_operand_equality::test): New.
> >     (fold_const_cc_tests): Use it.
> >     * tree-core.h (enum operand_equal_flag): Add OEP_STRUCTURAL_EQ.
> >     * tree-ssa-loop-ivopts.cc (record_group_use): Check for structural eq.
> >
> > gcc/testsuite/ChangeLog:
> >
> >     PR tree-optimization/114932
> >     * gfortran.dg/addressing-modes_2.f90: New test.
> >
> > ---
> > diff --git a/gcc/fold-const.cc b/gcc/fold-const.cc
> > index
> >
> 71e82b1d76d4106c7c23c54af8b35905a1af9f1c..52a4465de0355220befa6d6b
> > 3301e21e038e3e41 100644
> > --- a/gcc/fold-const.cc
> > +++ b/gcc/fold-const.cc
> > @@ -3208,7 +3208,19 @@ operand_compare::operand_equal_p (tree type0,
> > const_tree arg0,
> >        return tree_int_cst_equal (arg0, arg1);
> >      }
> >
> > -  if (!(flags & OEP_ADDRESS_OF))
> > +  if ((flags & OEP_STRUCTURAL_EQ)
> > +      && (CONVERT_EXPR_P (arg0) || CONVERT_EXPR_P (arg1)))
> > +    {
> > +      const_tree t_arg0 = arg0;
> > +      const_tree t_arg1 = arg1;
> > +      STRIP_NOPS (arg0);
> > +      STRIP_NOPS (arg1);
> > +      /* Only recurse if the conversion was one that was valid to strip.  
> > */
> > +      if (t_arg0 != arg0 || t_arg1 != arg1)
> > +   return operand_equal_p (type0, arg0, type1, arg1, flags);
> > +    }
> > +
> > +  if (!(flags & (OEP_ADDRESS_OF | OEP_STRUCTURAL_EQ)))
> >      {
> >        /* If both types don't have the same signedness, then we can't 
> > consider
> >      them equal.  We must check this before the STRIP_NOPS calls
> > @@ -3439,6 +3451,22 @@ operand_compare::operand_equal_p (tree type0,
> > const_tree arg0,
> >    if (flags & (OEP_ONLY_CONST | OEP_BITWISE))
> >      return false;
> >
> > +  /* Check if we are checking an operation where the two's compliment 
> > bitwise
> > +     representation of the result is not the same between signed and
> > +     unsigned arithmetic.  */
> > +  switch (TREE_CODE (arg0))
> > +  {
> > +  case PLUS_EXPR:
> > +  case MINUS_EXPR:
> > +  case MULT_EXPR:
> > +    break;
> > +
> > +  default:
> > +    /* Clear the structural comparison bit.  */
> > +    flags &= ~OEP_STRUCTURAL_EQ;
> > +    break;
> > +  }
> > +
> >  /* Define macros to test an operand from arg0 and arg1 for equality and a
> >     variant that allows null and views null as being different from any
> >     non-null value.  In the latter case, if either is null, the both
> > @@ -4218,7 +4246,7 @@ operand_compare::verify_hash_value (const_tree
> > arg0, const_tree arg1,
> >      {
> >        if (operand_equal_p (arg0, arg1, flags | OEP_NO_HASH_CHECK))
> >     {
> > -     if (arg0 != arg1 && !(flags & OEP_DECL_NAME))
> > +     if (arg0 != arg1 && !(flags & (OEP_DECL_NAME |
> > OEP_STRUCTURAL_EQ)))
> >         {
> >           inchash::hash hstate0 (0), hstate1 (0);
> >           hash_operand (arg0, hstate0, flags | OEP_HASH_CHECK);
> > @@ -17357,6 +17385,95 @@ test_arithmetic_folding ()
> >                                x);
> >  }
> >
> > +namespace test_operand_equality {
> > +
> > +/* Verify structural equality.  */
> > +
> > +/* Execute fold_vec_perm_cst unit tests.  */
> > +
> > +static void
> > +test ()
> > +{
> > +  tree stype = integer_type_node;
> > +  tree utype = unsigned_type_node;
> > +  tree x = create_tmp_var_raw (stype, "x");
> > +  tree y = create_tmp_var_raw (stype, "y");
> > +  tree z = create_tmp_var_raw (stype, "z");
> > +  tree four = build_int_cst (stype, 4);
> > +  tree lhs1 = fold_build2 (PLUS_EXPR, stype, x, y);
> > +  tree rhs1 = fold_convert (stype,
> > +           fold_build2 (PLUS_EXPR, utype,
> > +                        fold_convert (utype, x),
> > +                        fold_convert (utype, y)));
> > +
> > +  /* (int)((unsigned x) + (unsigned y)) == x + y.  */
> > +  ASSERT_TRUE (operand_equal_p (lhs1, rhs1, OEP_STRUCTURAL_EQ));
> > +  ASSERT_FALSE (operand_equal_p (lhs1, rhs1, 0));
> > +
> > +  /* (int)(unsigned) x == x.  */
> > +  tree lhs2 = build1 (NOP_EXPR, stype,
> > +           build1 (NOP_EXPR, utype, x));
> > +  tree rhs2 = x;
> > +  ASSERT_TRUE (operand_equal_p (lhs2, rhs2, OEP_STRUCTURAL_EQ));
> > +  ASSERT_TRUE (operand_equal_p (lhs2, rhs2, 0));
> > +
> > +  /* (unsigned x) + (unsigned y) == x + y.  */
> > +  tree lhs3 = lhs1;
> > +  tree rhs3 = fold_build2 (PLUS_EXPR, utype,
> > +                      fold_convert (utype, x),
> > +                      fold_convert (utype, y));
> > +  ASSERT_TRUE (operand_equal_p (lhs3, rhs3, OEP_STRUCTURAL_EQ));
> > +  ASSERT_FALSE (operand_equal_p (lhs3, rhs3, 0));
> > +
> > +  /* (unsigned x) / (unsigned y) == x / y.  */
> > +  tree lhs4 = fold_build2 (TRUNC_DIV_EXPR, stype, x, y);;
> > +  tree rhs4 = fold_build2 (TRUNC_DIV_EXPR, utype,
> > +                      fold_convert (utype, x),
> > +                      fold_convert (utype, y));
> > +  ASSERT_FALSE (operand_equal_p (lhs4, rhs4, OEP_STRUCTURAL_EQ));
> > +  ASSERT_FALSE (operand_equal_p (lhs4, rhs4, 0));
> > +
> > +  /* (unsigned x) / 4 == x / 4.  */
> > +  tree lhs5 = fold_build2 (TRUNC_DIV_EXPR, stype, x, four);;
> > +  tree rhs5 = fold_build2 (TRUNC_DIV_EXPR, utype,
> > +                      fold_convert (utype, x),
> > +                      fold_convert (utype, four));
> > +  ASSERT_FALSE (operand_equal_p (lhs5, rhs5, OEP_STRUCTURAL_EQ));
> > +  ASSERT_FALSE (operand_equal_p (lhs5, rhs5, 0));
> > +
> > +  /* (unsigned x) + 4 == x + 4.  */
> > +  tree lhs6 = fold_build2 (PLUS_EXPR, stype, x, four);
> > +  tree rhs6 = fold_build2 (PLUS_EXPR, utype,
> > +                      fold_convert (utype, x),
> > +                      fold_convert (utype, four));
> > +  ASSERT_TRUE (operand_equal_p (lhs6, rhs6, OEP_STRUCTURAL_EQ));
> > +  ASSERT_FALSE (operand_equal_p (lhs6, rhs6, 0));
> > +
> > +  /* (unsigned x) + 4 == 4 + x.  */
> > +  tree lhs7 = fold_build2 (PLUS_EXPR, stype, four, x);
> > +  tree rhs7 = fold_build2 (PLUS_EXPR, utype,
> > +                      fold_convert (utype, x),
> > +                      fold_convert (utype, four));
> > +  ASSERT_TRUE (operand_equal_p (lhs7, rhs7, OEP_STRUCTURAL_EQ));
> > +  ASSERT_FALSE (operand_equal_p (lhs7, rhs7, 0));
> > +
> > +  /* ((unsigned x) + 4) * (unsigned y)) + z == ((4 + x) * y) + z.  */
> > +  tree lhs8 = fold_build2 (PLUS_EXPR, stype,
> > +                      fold_build2 (MULT_EXPR, stype,
> > +                                   fold_build2 (PLUS_EXPR, stype, four, x),
> > +                                   y),
> > +                      z);
> > +  tree rhs8 = fold_build2 (MULT_EXPR, utype,
> > +                      fold_build2 (PLUS_EXPR, utype,
> > +                                   fold_convert (utype, x),
> > +                                   fold_convert (utype, four)),
> > +                      fold_convert (utype, y));
> > +  rhs8 = fold_build2 (PLUS_EXPR, stype, fold_convert (stype, rhs8), z);
> > +  ASSERT_TRUE (operand_equal_p (lhs8, rhs8, OEP_STRUCTURAL_EQ));
> > +  ASSERT_FALSE (operand_equal_p (lhs8, rhs8, 0));
> > +}
> > +}
> > +
> >  namespace test_fold_vec_perm_cst {
> >
> >  /* Build a VECTOR_CST corresponding to VMODE, and has
> > @@ -18150,6 +18267,7 @@ fold_const_cc_tests ()
> >    test_vector_folding ();
> >    test_vec_duplicate_folding ();
> >    test_fold_vec_perm_cst::test ();
> > +  test_operand_equality::test ();
> >  }
> >
> >  } // namespace selftest
> > diff --git a/gcc/testsuite/gfortran.dg/addressing-modes_2.f90
> > b/gcc/testsuite/gfortran.dg/addressing-modes_2.f90
> > new file mode 100644
> > index
> >
> 0000000000000000000000000000000000000000..8eee4be3dc4d69fecfacd4c
> > 2e24a4973c8539fae
> > --- /dev/null
> > +++ b/gcc/testsuite/gfortran.dg/addressing-modes_2.f90
> > @@ -0,0 +1,20 @@
> > +! { dg-do compile { target aarch64-*-* } }
> > +! { dg-additional-options "-w -Ofast -fdump-tree-ivopts-all" }
> > +
> > +module a
> > +integer, parameter :: b=3, c=b
> > +contains
> > +subroutine d(block)
> > +integer f, col   , block(:, :, :), e
> > +do f = 1, c
> > +   do col = 1, c
> > +             block(:f,                          :, e()) = do
> > +     end do
> > +  end do
> > +  end
> > +  end
> > +
> > +! { dg-final { scan-tree-dump-not {Selected IV set for loop .+ niters, 3 
> > IVs:} ivopts
> }
> > }
> > +! { dg-final { scan-tree-dump-times {Selected IV set for loop .+ niters, 2 
> > IVs:} 1
> > ivopts } }
> > +! { dg-final { scan-tree-dump-times {Selected IV set for loop .+ niters, 1 
> > IVs:} 1
> > ivopts } }
> > +
> > diff --git a/gcc/tree-core.h b/gcc/tree-core.h
> > index
> >
> 27c569c77022227643151fa4a909a95f3d45bf55..fe398d99bcaa9e8fcec45e5e0
> > 7d133bc393aa4a7 100644
> > --- a/gcc/tree-core.h
> > +++ b/gcc/tree-core.h
> > @@ -962,7 +962,9 @@ enum operand_equal_flag {
> >    /* In conjunction with OEP_LEXICOGRAPHIC considers names of declarations
> >       of the same kind.  Used to compare VLA bounds involving parameters
> >       across redeclarations of the same function.  */
> > -  OEP_DECL_NAME = 512
> > +  OEP_DECL_NAME = 512,
> > +  /* Check for structural equivalence and ignore NOP_CONVERT casts.  */
> > +  OEP_STRUCTURAL_EQ = 1024
> >  };
> >
> >  /* Enum and arrays used for tree allocation stats.
> > diff --git a/gcc/tree-ssa-loop-ivopts.cc b/gcc/tree-ssa-loop-ivopts.cc
> > index
> >
> c3218a3e8eedbb8d0a7f14c01eeb069cb6024c29..b05e4ac1e63b5c110707a8a3
> > ed5e614b7ccc702f 100644
> > --- a/gcc/tree-ssa-loop-ivopts.cc
> > +++ b/gcc/tree-ssa-loop-ivopts.cc
> > @@ -1623,8 +1623,8 @@ record_group_use (struct ivopts_data *data, tree
> > *use_p,
> >
> >       /* Check if it has the same stripped base and step.  */
> >       if (operand_equal_p (iv->base_object, use->iv->base_object, 0)
> > -         && operand_equal_p (iv->step, use->iv->step, 0)
> > -         && operand_equal_p (addr_base, use->addr_base, 0))
> > +         && operand_equal_p (iv->step, use->iv->step, OEP_STRUCTURAL_EQ)
> > +         && operand_equal_p (addr_base, use->addr_base,
> > OEP_STRUCTURAL_EQ))
> >         break;
> >     }
> >        if (i == data->vgroups.length ())
> >
> >
> >
> >
> > --

Reply via email to