RE: [PATCH 2/2]middle-end: use two's complement equality when comparing IVs during candidate selection [PR114932]

Tamar Christina Mon, 14 Oct 2024 08:12:31 -0700

ping


> -----Original Message-----
> From: Tamar Christina <[email protected]>
> Sent: Monday, September 23, 2024 8:41 AM
> To: [email protected]
> Cc: nd <[email protected]>; [email protected]; [email protected]
> Subject: RE: [PATCH 2/2]middle-end: use two's complement equality when
> comparing IVs during candidate selection [PR114932]
> 
> ping
> 
> > -----Original Message-----
> > From: Tamar Christina
> > Sent: Tuesday, September 10, 2024 8:57 PM
> > To: Tamar Christina <[email protected]>; [email protected]
> > Cc: nd <[email protected]>; [email protected]; [email protected]
> > Subject: RE: [PATCH 2/2]middle-end: use two's complement equality when
> > comparing IVs during candidate selection [PR114932]
> >
> > ping
> >
> > > -----Original Message-----
> > > From: Tamar Christina <[email protected]>
> > > Sent: Tuesday, August 20, 2024 2:06 PM
> > > To: [email protected]
> > > Cc: nd <[email protected]>; [email protected]; [email protected]
> > > Subject: [PATCH 2/2]middle-end: use two's complement equality when
> > comparing
> > > IVs during candidate selection [PR114932]
> > >
> > > Hi All,
> > >
> > > IVOPTS normally uses affine trees to perform comparisons between different
> IVs,
> > > but these seem to have been missing in two key spots and instead normal 
> > > tree
> > > equivalencies used.
> > >
> > > In some cases where we have a two-complements equivalence but not a strict
> > > signedness equivalencies we end up generating both a signed and unsigned 
> > > IV
> for
> > > the same candidate.
> > >
> > > This patch implements a new OEP flag called OEP_STRUCTURAL_EQ.  This flag
> will
> > > check if the operands would produce the same bit values after the
> computations
> > > even if the final sign is different.
> > >
> > > This happens quite a lot with fortran but can also happen in C because 
> > > this
> came
> > > code is unable to figure out when one expression is a multiple of another.
> > >
> > > As an example in the attached testcase we get:
> > >
> > > Initial set of candidates:
> > >   cost: 24 (complexity 3)
> > >   reg_cost: 9
> > >   cand_cost: 15
> > >   cand_group_cost: 0 (complexity 3)
> > >   candidates: 1, 6, 8
> > >    group:0 --> iv_cand:6, cost=(0,1)
> > >    group:1 --> iv_cand:1, cost=(0,0)
> > >    group:2 --> iv_cand:8, cost=(0,1)
> > >    group:3 --> iv_cand:8, cost=(0,1)
> > >   invariant variables: 6
> > >   invariant expressions: 1, 2
> > >
> > > <Invariant Expressions>:
> > > inv_expr 1:     stride.3_27 * 4
> > > inv_expr 2:     (unsigned long) stride.3_27 * 4
> > >
> > > These end up being used in the same group:
> > >
> > > Group 1:
> > > cand  cost    compl.  inv.expr.       inv.vars
> > > 1     0       0       NIL;    6
> > > 2     0       0       NIL;    6
> > > 3     0       0       NIL;    6
> > >
> > > which ends up with IV opts picking the signed and unsigned IVs:
> > >
> > > Improved to:
> > >   cost: 24 (complexity 3)
> > >   reg_cost: 9
> > >   cand_cost: 15
> > >   cand_group_cost: 0 (complexity 3)
> > >   candidates: 1, 6, 8
> > >    group:0 --> iv_cand:6, cost=(0,1)
> > >    group:1 --> iv_cand:1, cost=(0,0)
> > >    group:2 --> iv_cand:8, cost=(0,1)
> > >    group:3 --> iv_cand:8, cost=(0,1)
> > >   invariant variables: 6
> > >   invariant expressions: 1, 2
> > >
> > > and so generates the same IV as both signed and unsigned:
> > >
> > > ;;   basic block 21, loop depth 3, count 214748368 (estimated locally, 
> > > freq
> > > 58.2545), maybe hot
> > > ;;    prev block 28, next block 31, flags: (NEW, REACHABLE, VISITED)
> > > ;;    pred:       28 [always]  count:23622320 (estimated locally, freq 
> > > 6.4080)
> > > (FALLTHRU,EXECUTABLE)
> > > ;;                25 [always]  count:191126046 (estimated locally, freq 
> > > 51.8465)
> > > (FALLTHRU,DFS_BACK,EXECUTABLE)
> > >   # .MEM_66 = PHI <.MEM_34(28), .MEM_22(25)>
> > >   # ivtmp.22_41 = PHI <0(28), ivtmp.22_82(25)>
> > >   # ivtmp.26_51 = PHI <ivtmp.26_55(28), ivtmp.26_72(25)>
> > >   # ivtmp.28_90 = PHI <ivtmp.28_99(28), ivtmp.28_98(25)>
> > >
> > > ...
> > >
> > > ;;   basic block 24, loop depth 3, count 214748366 (estimated locally, 
> > > freq
> > > 58.2545), maybe hot
> > > ;;    prev block 22, next block 25, flags: (NEW, REACHABLE, VISITED)'
> > > ;;    pred:       22 [always]  count:95443719 (estimated locally, freq 
> > > 25.8909)
> > > (FALLTHRU)
> > > ;;                21 [33.3% (guessed)]  count:71582790 (estimated 
> > > locally, freq
> > 19.4182)
> > > (TRUE_VALUE,EXECUTABLE)
> > > ;;                31 [33.3% (guessed)]  count:47721860 (estimated 
> > > locally, freq
> > 12.9455)
> > > (TRUE_VALUE,EXECUTABLE)
> > > # .MEM_22 = PHI <.MEM_44(22), .MEM_31(21), .MEM_79(31)>
> > > ivtmp.22_82 = ivtmp.22_41 + 1;
> > > ivtmp.26_72 = ivtmp.26_51 + _80;
> > > ivtmp.28_98 = ivtmp.28_90 + _39;
> > >
> > > These two IVs are always used as unsigned, so IV ops generates:
> > >
> > >   _73 = stride.3_27 * 4;
> > >   _80 = (unsigned long) _73;
> > >   _54 = (unsigned long) stride.3_27;
> > >   _39 = _54 * 4;
> > >
> > > Which means that in e.g. exchange2 we generate a lot of duplicate code.
> > >
> > > This is because candidate 6 and 8 are equivalent under two's complement 
> > > but
> > have
> > > different signs.
> > >
> > > This patch changes it so that if you have two IVs that are affine 
> > > equivalent to
> > > just pick one over the other.  IV already has code for this, so the patch 
> > > just
> > > uses affine trees instead of tree for the check.
> > >
> > > With it we get:
> > >
> > > <Invariant Expressions>:
> > > inv_expr 1:     stride.3_27 * 4
> > >
> > > <Group-candidate Costs>:
> > > Group 0:
> > >   cand  cost    compl.  inv.expr.       inv.vars
> > >   5     0       2       NIL;    NIL;
> > >   6     0       3       NIL;    NIL;
> > >
> > > Group 1:
> > >   cand  cost    compl.  inv.expr.       inv.vars
> > >   1     0       0       NIL;    6
> > >   2     0       0       NIL;    6
> > >   3     0       0       NIL;    6
> > >   4     0       0       NIL;    6
> > >
> > > Initial set of candidates:
> > >   cost: 16 (complexity 3)
> > >   reg_cost: 6
> > >   cand_cost: 10
> > >   cand_group_cost: 0 (complexity 3)
> > >   candidates: 1, 6
> > >    group:0 --> iv_cand:6, cost=(0,3)
> > >    group:1 --> iv_cand:1, cost=(0,0)
> > >   invariant variables: 6
> > >   invariant expressions: 1
> > >
> > > The two patches together results in a 10% performance increase in 
> > > exchange2
> in
> > > SPECCPU 2017 and a 4% reduction in binary size and a 5% improvement in
> > > compile
> > > time. There's also a 5% performance improvement in fotonik3d and similar
> > > reduction in binary size.
> > >
> > > Bootstrapped Regtested on aarch64-none-linux-gnu,
> > > arm-none-linux-gnueabihf, x86_64-pc-linux-gnu
> > > -m32, -m64 and no issues.
> > >
> > > Ok for master?
> > >
> > > Thanks,
> > > Tamar
> > >
> > > gcc/ChangeLog:
> > >
> > >   PR tree-optimization/114932
> > >   * fold-const.cc (operand_compare::operand_equal_p): Use it.
> > >   (operand_compare::verify_hash_value): Likewise.
> > >   (test_operand_equality::test): New.
> > >   (fold_const_cc_tests): Use it.
> > >   * tree-core.h (enum operand_equal_flag): Add OEP_STRUCTURAL_EQ.
> > >   * tree-ssa-loop-ivopts.cc (record_group_use): Check for structural eq.
> > >
> > > gcc/testsuite/ChangeLog:
> > >
> > >   PR tree-optimization/114932
> > >   * gfortran.dg/addressing-modes_2.f90: New test.
> > >
> > > ---
> > > diff --git a/gcc/fold-const.cc b/gcc/fold-const.cc
> > > index
> > >
> >
> 71e82b1d76d4106c7c23c54af8b35905a1af9f1c..52a4465de0355220befa6d6b
> > > 3301e21e038e3e41 100644
> > > --- a/gcc/fold-const.cc
> > > +++ b/gcc/fold-const.cc
> > > @@ -3208,7 +3208,19 @@ operand_compare::operand_equal_p (tree type0,
> > > const_tree arg0,
> > >        return tree_int_cst_equal (arg0, arg1);
> > >      }
> > >
> > > -  if (!(flags & OEP_ADDRESS_OF))
> > > +  if ((flags & OEP_STRUCTURAL_EQ)
> > > +      && (CONVERT_EXPR_P (arg0) || CONVERT_EXPR_P (arg1)))
> > > +    {
> > > +      const_tree t_arg0 = arg0;
> > > +      const_tree t_arg1 = arg1;
> > > +      STRIP_NOPS (arg0);
> > > +      STRIP_NOPS (arg1);
> > > +      /* Only recurse if the conversion was one that was valid to strip. 
> > >  */
> > > +      if (t_arg0 != arg0 || t_arg1 != arg1)
> > > + return operand_equal_p (type0, arg0, type1, arg1, flags);
> > > +    }
> > > +
> > > +  if (!(flags & (OEP_ADDRESS_OF | OEP_STRUCTURAL_EQ)))
> > >      {
> > >        /* If both types don't have the same signedness, then we can't 
> > > consider
> > >    them equal.  We must check this before the STRIP_NOPS calls
> > > @@ -3439,6 +3451,22 @@ operand_compare::operand_equal_p (tree type0,
> > > const_tree arg0,
> > >    if (flags & (OEP_ONLY_CONST | OEP_BITWISE))
> > >      return false;
> > >
> > > +  /* Check if we are checking an operation where the two's compliment 
> > > bitwise
> > > +     representation of the result is not the same between signed and
> > > +     unsigned arithmetic.  */
> > > +  switch (TREE_CODE (arg0))
> > > +  {
> > > +  case PLUS_EXPR:
> > > +  case MINUS_EXPR:
> > > +  case MULT_EXPR:
> > > +    break;
> > > +
> > > +  default:
> > > +    /* Clear the structural comparison bit.  */
> > > +    flags &= ~OEP_STRUCTURAL_EQ;
> > > +    break;
> > > +  }
> > > +
> > >  /* Define macros to test an operand from arg0 and arg1 for equality and a
> > >     variant that allows null and views null as being different from any
> > >     non-null value.  In the latter case, if either is null, the both
> > > @@ -4218,7 +4246,7 @@ operand_compare::verify_hash_value (const_tree
> > > arg0, const_tree arg1,
> > >      {
> > >        if (operand_equal_p (arg0, arg1, flags | OEP_NO_HASH_CHECK))
> > >   {
> > > -   if (arg0 != arg1 && !(flags & OEP_DECL_NAME))
> > > +   if (arg0 != arg1 && !(flags & (OEP_DECL_NAME |
> > > OEP_STRUCTURAL_EQ)))
> > >       {
> > >         inchash::hash hstate0 (0), hstate1 (0);
> > >         hash_operand (arg0, hstate0, flags | OEP_HASH_CHECK);
> > > @@ -17357,6 +17385,95 @@ test_arithmetic_folding ()
> > >                              x);
> > >  }
> > >
> > > +namespace test_operand_equality {
> > > +
> > > +/* Verify structural equality.  */
> > > +
> > > +/* Execute fold_vec_perm_cst unit tests.  */
> > > +
> > > +static void
> > > +test ()
> > > +{
> > > +  tree stype = integer_type_node;
> > > +  tree utype = unsigned_type_node;
> > > +  tree x = create_tmp_var_raw (stype, "x");
> > > +  tree y = create_tmp_var_raw (stype, "y");
> > > +  tree z = create_tmp_var_raw (stype, "z");
> > > +  tree four = build_int_cst (stype, 4);
> > > +  tree lhs1 = fold_build2 (PLUS_EXPR, stype, x, y);
> > > +  tree rhs1 = fold_convert (stype,
> > > +         fold_build2 (PLUS_EXPR, utype,
> > > +                      fold_convert (utype, x),
> > > +                      fold_convert (utype, y)));
> > > +
> > > +  /* (int)((unsigned x) + (unsigned y)) == x + y.  */
> > > +  ASSERT_TRUE (operand_equal_p (lhs1, rhs1, OEP_STRUCTURAL_EQ));
> > > +  ASSERT_FALSE (operand_equal_p (lhs1, rhs1, 0));
> > > +
> > > +  /* (int)(unsigned) x == x.  */
> > > +  tree lhs2 = build1 (NOP_EXPR, stype,
> > > +         build1 (NOP_EXPR, utype, x));
> > > +  tree rhs2 = x;
> > > +  ASSERT_TRUE (operand_equal_p (lhs2, rhs2, OEP_STRUCTURAL_EQ));
> > > +  ASSERT_TRUE (operand_equal_p (lhs2, rhs2, 0));
> > > +
> > > +  /* (unsigned x) + (unsigned y) == x + y.  */
> > > +  tree lhs3 = lhs1;
> > > +  tree rhs3 = fold_build2 (PLUS_EXPR, utype,
> > > +                    fold_convert (utype, x),
> > > +                    fold_convert (utype, y));
> > > +  ASSERT_TRUE (operand_equal_p (lhs3, rhs3, OEP_STRUCTURAL_EQ));
> > > +  ASSERT_FALSE (operand_equal_p (lhs3, rhs3, 0));
> > > +
> > > +  /* (unsigned x) / (unsigned y) == x / y.  */
> > > +  tree lhs4 = fold_build2 (TRUNC_DIV_EXPR, stype, x, y);;
> > > +  tree rhs4 = fold_build2 (TRUNC_DIV_EXPR, utype,
> > > +                    fold_convert (utype, x),
> > > +                    fold_convert (utype, y));
> > > +  ASSERT_FALSE (operand_equal_p (lhs4, rhs4, OEP_STRUCTURAL_EQ));
> > > +  ASSERT_FALSE (operand_equal_p (lhs4, rhs4, 0));
> > > +
> > > +  /* (unsigned x) / 4 == x / 4.  */
> > > +  tree lhs5 = fold_build2 (TRUNC_DIV_EXPR, stype, x, four);;
> > > +  tree rhs5 = fold_build2 (TRUNC_DIV_EXPR, utype,
> > > +                    fold_convert (utype, x),
> > > +                    fold_convert (utype, four));
> > > +  ASSERT_FALSE (operand_equal_p (lhs5, rhs5, OEP_STRUCTURAL_EQ));
> > > +  ASSERT_FALSE (operand_equal_p (lhs5, rhs5, 0));
> > > +
> > > +  /* (unsigned x) + 4 == x + 4.  */
> > > +  tree lhs6 = fold_build2 (PLUS_EXPR, stype, x, four);
> > > +  tree rhs6 = fold_build2 (PLUS_EXPR, utype,
> > > +                    fold_convert (utype, x),
> > > +                    fold_convert (utype, four));
> > > +  ASSERT_TRUE (operand_equal_p (lhs6, rhs6, OEP_STRUCTURAL_EQ));
> > > +  ASSERT_FALSE (operand_equal_p (lhs6, rhs6, 0));
> > > +
> > > +  /* (unsigned x) + 4 == 4 + x.  */
> > > +  tree lhs7 = fold_build2 (PLUS_EXPR, stype, four, x);
> > > +  tree rhs7 = fold_build2 (PLUS_EXPR, utype,
> > > +                    fold_convert (utype, x),
> > > +                    fold_convert (utype, four));
> > > +  ASSERT_TRUE (operand_equal_p (lhs7, rhs7, OEP_STRUCTURAL_EQ));
> > > +  ASSERT_FALSE (operand_equal_p (lhs7, rhs7, 0));
> > > +
> > > +  /* ((unsigned x) + 4) * (unsigned y)) + z == ((4 + x) * y) + z.  */
> > > +  tree lhs8 = fold_build2 (PLUS_EXPR, stype,
> > > +                    fold_build2 (MULT_EXPR, stype,
> > > +                                 fold_build2 (PLUS_EXPR, stype, four, x),
> > > +                                 y),
> > > +                    z);
> > > +  tree rhs8 = fold_build2 (MULT_EXPR, utype,
> > > +                    fold_build2 (PLUS_EXPR, utype,
> > > +                                 fold_convert (utype, x),
> > > +                                 fold_convert (utype, four)),
> > > +                    fold_convert (utype, y));
> > > +  rhs8 = fold_build2 (PLUS_EXPR, stype, fold_convert (stype, rhs8), z);
> > > +  ASSERT_TRUE (operand_equal_p (lhs8, rhs8, OEP_STRUCTURAL_EQ));
> > > +  ASSERT_FALSE (operand_equal_p (lhs8, rhs8, 0));
> > > +}
> > > +}
> > > +
> > >  namespace test_fold_vec_perm_cst {
> > >
> > >  /* Build a VECTOR_CST corresponding to VMODE, and has
> > > @@ -18150,6 +18267,7 @@ fold_const_cc_tests ()
> > >    test_vector_folding ();
> > >    test_vec_duplicate_folding ();
> > >    test_fold_vec_perm_cst::test ();
> > > +  test_operand_equality::test ();
> > >  }
> > >
> > >  } // namespace selftest
> > > diff --git a/gcc/testsuite/gfortran.dg/addressing-modes_2.f90
> > > b/gcc/testsuite/gfortran.dg/addressing-modes_2.f90
> > > new file mode 100644
> > > index
> > >
> >
> 0000000000000000000000000000000000000000..8eee4be3dc4d69fecfacd4c
> > > 2e24a4973c8539fae
> > > --- /dev/null
> > > +++ b/gcc/testsuite/gfortran.dg/addressing-modes_2.f90
> > > @@ -0,0 +1,20 @@
> > > +! { dg-do compile { target aarch64-*-* } }
> > > +! { dg-additional-options "-w -Ofast -fdump-tree-ivopts-all" }
> > > +
> > > +module a
> > > +integer, parameter :: b=3, c=b
> > > +contains
> > > +subroutine d(block)
> > > +integer f, col   , block(:, :, :), e
> > > +do f = 1, c
> > > +   do col = 1, c
> > > +             block(:f,                          :, e()) = do
> > > +     end do
> > > +  end do
> > > +  end
> > > +  end
> > > +
> > > +! { dg-final { scan-tree-dump-not {Selected IV set for loop .+ niters, 3 
> > > IVs:}
> ivopts
> > }
> > > }
> > > +! { dg-final { scan-tree-dump-times {Selected IV set for loop .+ niters, 
> > > 2 IVs:} 1
> > > ivopts } }
> > > +! { dg-final { scan-tree-dump-times {Selected IV set for loop .+ niters, 
> > > 1 IVs:} 1
> > > ivopts } }
> > > +
> > > diff --git a/gcc/tree-core.h b/gcc/tree-core.h
> > > index
> > >
> >
> 27c569c77022227643151fa4a909a95f3d45bf55..fe398d99bcaa9e8fcec45e5e0
> > > 7d133bc393aa4a7 100644
> > > --- a/gcc/tree-core.h
> > > +++ b/gcc/tree-core.h
> > > @@ -962,7 +962,9 @@ enum operand_equal_flag {
> > >    /* In conjunction with OEP_LEXICOGRAPHIC considers names of 
> > > declarations
> > >       of the same kind.  Used to compare VLA bounds involving parameters
> > >       across redeclarations of the same function.  */
> > > -  OEP_DECL_NAME = 512
> > > +  OEP_DECL_NAME = 512,
> > > +  /* Check for structural equivalence and ignore NOP_CONVERT casts.  */
> > > +  OEP_STRUCTURAL_EQ = 1024
> > >  };
> > >
> > >  /* Enum and arrays used for tree allocation stats.
> > > diff --git a/gcc/tree-ssa-loop-ivopts.cc b/gcc/tree-ssa-loop-ivopts.cc
> > > index
> > >
> >
> c3218a3e8eedbb8d0a7f14c01eeb069cb6024c29..b05e4ac1e63b5c110707a8a3
> > > ed5e614b7ccc702f 100644
> > > --- a/gcc/tree-ssa-loop-ivopts.cc
> > > +++ b/gcc/tree-ssa-loop-ivopts.cc
> > > @@ -1623,8 +1623,8 @@ record_group_use (struct ivopts_data *data, tree
> > > *use_p,
> > >
> > >     /* Check if it has the same stripped base and step.  */
> > >     if (operand_equal_p (iv->base_object, use->iv->base_object, 0)
> > > -       && operand_equal_p (iv->step, use->iv->step, 0)
> > > -       && operand_equal_p (addr_base, use->addr_base, 0))
> > > +       && operand_equal_p (iv->step, use->iv->step, OEP_STRUCTURAL_EQ)
> > > +       && operand_equal_p (addr_base, use->addr_base,
> > > OEP_STRUCTURAL_EQ))
> > >       break;
> > >   }
> > >        if (i == data->vgroups.length ())
> > >
> > >
> > >
> > >
> > > --

RE: [PATCH 2/2]middle-end: use two's complement equality when comparing IVs during candidate selection [PR114932]

Reply via email to