On Tue, Aug 20, 2024 at 3:08 PM Tamar Christina <tamar.christ...@arm.com> wrote:
>
> Hi All,
>
> IVOPTS normally uses affine trees to perform comparisons between different 
> IVs,
> but these seem to have been missing in two key spots and instead normal tree
> equivalencies used.
>
> In some cases where we have a two-complements equivalence but not a strict
> signedness equivalencies we end up generating both a signed and unsigned IV 
> for
> the same candidate.
>
> This patch implements a new OEP flag called OEP_STRUCTURAL_EQ.  This flag will
> check if the operands would produce the same bit values after the computations
> even if the final sign is different.

I think the name is badly chosen - we already have OEP_LEXICOGRAPHIC and
OEP_BITWISE.  I would suggest to use OEP_ASSUME_TWOS_COMPLEMENT
or OEP_ASSUME_WRAPV.

> This happens quite a lot with fortran but can also happen in C because this 
> came
> code is unable to figure out when one expression is a multiple of another.
>
> As an example in the attached testcase we get:
>
> Initial set of candidates:
>   cost: 24 (complexity 3)
>   reg_cost: 9
>   cand_cost: 15
>   cand_group_cost: 0 (complexity 3)
>   candidates: 1, 6, 8
>    group:0 --> iv_cand:6, cost=(0,1)
>    group:1 --> iv_cand:1, cost=(0,0)
>    group:2 --> iv_cand:8, cost=(0,1)
>    group:3 --> iv_cand:8, cost=(0,1)
>   invariant variables: 6
>   invariant expressions: 1, 2
>
> <Invariant Expressions>:
> inv_expr 1:     stride.3_27 * 4
> inv_expr 2:     (unsigned long) stride.3_27 * 4
>
> These end up being used in the same group:
>
> Group 1:
> cand  cost    compl.  inv.expr.       inv.vars
> 1     0       0       NIL;    6
> 2     0       0       NIL;    6
> 3     0       0       NIL;    6
>
> which ends up with IV opts picking the signed and unsigned IVs:
>
> Improved to:
>   cost: 24 (complexity 3)
>   reg_cost: 9
>   cand_cost: 15
>   cand_group_cost: 0 (complexity 3)
>   candidates: 1, 6, 8
>    group:0 --> iv_cand:6, cost=(0,1)
>    group:1 --> iv_cand:1, cost=(0,0)
>    group:2 --> iv_cand:8, cost=(0,1)
>    group:3 --> iv_cand:8, cost=(0,1)
>   invariant variables: 6
>   invariant expressions: 1, 2
>
> and so generates the same IV as both signed and unsigned:
>
> ;;   basic block 21, loop depth 3, count 214748368 (estimated locally, freq 
> 58.2545), maybe hot
> ;;    prev block 28, next block 31, flags: (NEW, REACHABLE, VISITED)
> ;;    pred:       28 [always]  count:23622320 (estimated locally, freq 
> 6.4080) (FALLTHRU,EXECUTABLE)
> ;;                25 [always]  count:191126046 (estimated locally, freq 
> 51.8465) (FALLTHRU,DFS_BACK,EXECUTABLE)
>   # .MEM_66 = PHI <.MEM_34(28), .MEM_22(25)>
>   # ivtmp.22_41 = PHI <0(28), ivtmp.22_82(25)>
>   # ivtmp.26_51 = PHI <ivtmp.26_55(28), ivtmp.26_72(25)>
>   # ivtmp.28_90 = PHI <ivtmp.28_99(28), ivtmp.28_98(25)>
>
> ...
>
> ;;   basic block 24, loop depth 3, count 214748366 (estimated locally, freq 
> 58.2545), maybe hot
> ;;    prev block 22, next block 25, flags: (NEW, REACHABLE, VISITED)'
> ;;    pred:       22 [always]  count:95443719 (estimated locally, freq 
> 25.8909) (FALLTHRU)
> ;;                21 [33.3% (guessed)]  count:71582790 (estimated locally, 
> freq 19.4182) (TRUE_VALUE,EXECUTABLE)
> ;;                31 [33.3% (guessed)]  count:47721860 (estimated locally, 
> freq 12.9455) (TRUE_VALUE,EXECUTABLE)
> # .MEM_22 = PHI <.MEM_44(22), .MEM_31(21), .MEM_79(31)>
> ivtmp.22_82 = ivtmp.22_41 + 1;
> ivtmp.26_72 = ivtmp.26_51 + _80;
> ivtmp.28_98 = ivtmp.28_90 + _39;
>
> These two IVs are always used as unsigned, so IV ops generates:
>
>   _73 = stride.3_27 * 4;
>   _80 = (unsigned long) _73;
>   _54 = (unsigned long) stride.3_27;
>   _39 = _54 * 4;
>
> Which means that in e.g. exchange2 we generate a lot of duplicate code.
>
> This is because candidate 6 and 8 are equivalent under two's complement but 
> have
> different signs.
>
> This patch changes it so that if you have two IVs that are affine equivalent 
> to
> just pick one over the other.  IV already has code for this, so the patch just
> uses affine trees instead of tree for the check.
>
> With it we get:
>
> <Invariant Expressions>:
> inv_expr 1:     stride.3_27 * 4
>
> <Group-candidate Costs>:
> Group 0:
>   cand  cost    compl.  inv.expr.       inv.vars
>   5     0       2       NIL;    NIL;
>   6     0       3       NIL;    NIL;
>
> Group 1:
>   cand  cost    compl.  inv.expr.       inv.vars
>   1     0       0       NIL;    6
>   2     0       0       NIL;    6
>   3     0       0       NIL;    6
>   4     0       0       NIL;    6
>
> Initial set of candidates:
>   cost: 16 (complexity 3)
>   reg_cost: 6
>   cand_cost: 10
>   cand_group_cost: 0 (complexity 3)
>   candidates: 1, 6
>    group:0 --> iv_cand:6, cost=(0,3)
>    group:1 --> iv_cand:1, cost=(0,0)
>   invariant variables: 6
>   invariant expressions: 1
>
> The two patches together results in a 10% performance increase in exchange2 in
> SPECCPU 2017 and a 4% reduction in binary size and a 5% improvement in compile
> time. There's also a 5% performance improvement in fotonik3d and similar
> reduction in binary size.
>
> Bootstrapped Regtested on aarch64-none-linux-gnu,
> arm-none-linux-gnueabihf, x86_64-pc-linux-gnu
> -m32, -m64 and no issues.
>
> Ok for master?
>
> Thanks,
> Tamar
>
> gcc/ChangeLog:
>
>         PR tree-optimization/114932
>         * fold-const.cc (operand_compare::operand_equal_p): Use it.
>         (operand_compare::verify_hash_value): Likewise.
>         (test_operand_equality::test): New.
>         (fold_const_cc_tests): Use it.
>         * tree-core.h (enum operand_equal_flag): Add OEP_STRUCTURAL_EQ.
>         * tree-ssa-loop-ivopts.cc (record_group_use): Check for structural eq.
>
> gcc/testsuite/ChangeLog:
>
>         PR tree-optimization/114932
>         * gfortran.dg/addressing-modes_2.f90: New test.
>
> ---
> diff --git a/gcc/fold-const.cc b/gcc/fold-const.cc
> index 
> 71e82b1d76d4106c7c23c54af8b35905a1af9f1c..52a4465de0355220befa6d6b3301e21e038e3e41
>  100644
> --- a/gcc/fold-const.cc
> +++ b/gcc/fold-const.cc
> @@ -3208,7 +3208,19 @@ operand_compare::operand_equal_p (tree type0, 
> const_tree arg0,
>        return tree_int_cst_equal (arg0, arg1);
>      }
>
> -  if (!(flags & OEP_ADDRESS_OF))
> +  if ((flags & OEP_STRUCTURAL_EQ)
> +      && (CONVERT_EXPR_P (arg0) || CONVERT_EXPR_P (arg1)))
> +    {
> +      const_tree t_arg0 = arg0;
> +      const_tree t_arg1 = arg1;
> +      STRIP_NOPS (arg0);
> +      STRIP_NOPS (arg1);
> +      /* Only recurse if the conversion was one that was valid to strip.  */
> +      if (t_arg0 != arg0 || t_arg1 != arg1)
> +       return operand_equal_p (type0, arg0, type1, arg1, flags);

(*)

> +    }
> +
> +  if (!(flags & (OEP_ADDRESS_OF | OEP_STRUCTURAL_EQ)))
>      {
>        /* If both types don't have the same signedness, then we can't consider
>          them equal.  We must check this before the STRIP_NOPS calls
> @@ -3439,6 +3451,22 @@ operand_compare::operand_equal_p (tree type0, 
> const_tree arg0,
>    if (flags & (OEP_ONLY_CONST | OEP_BITWISE))
>      return false;
>
> +  /* Check if we are checking an operation where the two's compliment bitwise
> +     representation of the result is not the same between signed and
> +     unsigned arithmetic.  */
> +  switch (TREE_CODE (arg0))
> +  {
> +  case PLUS_EXPR:
> +  case MINUS_EXPR:
> +  case MULT_EXPR:
> +    break;
> +
> +  default:
> +    /* Clear the structural comparison bit.  */
> +    flags &= ~OEP_STRUCTURAL_EQ;

So this handling means that a / (int)((unsigned)b - (unsigned)c)) and
a / (b - c) are not equal with respect to OEP_STRUCTURAL_EQ because
we drop that flag when processing the division and thus the recursion on
the minus operands doesn't have it any more.

So shouldn't this and (*) be integrated with the TYPE_UNSIGNED compare in
the !OEP_ADDRESS_OF case?  Aka not require same signedness for
PLUS/MINUS/MULT/CONVERT?

I think we have to document what the TYPE, ARG combination means
semantically.  As far as I understand TYPE doesn't specify the signedness
of the operation ARG but instead represents a possibly stripped conversion
of the result?

--

There's

  /* Don't handle more cases for OEP_BITWISE, since we can't guarantee that
     two instances of undefined behavior will give identical results.  */
  if (flags & (OEP_ONLY_CONST | OEP_BITWISE))
    return false;

which is true - so I think we need to document what using OEP_STRUCTURAL_EQ
means, in particular that it does not mean you can replace ARG0 with
ARG1 without
externally guaranteeing you are not introducing new UB.  I do wonder about
say comparing (int)((unsigned)a + (unsigned)b) + (c + d) with
(a + b) + (int)((unsigned)c + (unsigned)d) for example - we can replace both
with a all-unsigned computation or replace one with the other when both would
have been always executed.  I think this needs to be clearly documented.

> +    break;
> +  }
> +
>  /* Define macros to test an operand from arg0 and arg1 for equality and a
>     variant that allows null and views null as being different from any
>     non-null value.  In the latter case, if either is null, the both
> @@ -4218,7 +4246,7 @@ operand_compare::verify_hash_value (const_tree arg0, 
> const_tree arg1,
>      {
>        if (operand_equal_p (arg0, arg1, flags | OEP_NO_HASH_CHECK))
>         {
> -         if (arg0 != arg1 && !(flags & OEP_DECL_NAME))
> +         if (arg0 != arg1 && !(flags & (OEP_DECL_NAME | OEP_STRUCTURAL_EQ)))
>             {
>               inchash::hash hstate0 (0), hstate1 (0);
>               hash_operand (arg0, hstate0, flags | OEP_HASH_CHECK);
> @@ -17357,6 +17385,95 @@ test_arithmetic_folding ()
>                                    x);
>  }
>
> +namespace test_operand_equality {
> +
> +/* Verify structural equality.  */
> +
> +/* Execute fold_vec_perm_cst unit tests.  */
> +
> +static void
> +test ()
> +{
> +  tree stype = integer_type_node;
> +  tree utype = unsigned_type_node;
> +  tree x = create_tmp_var_raw (stype, "x");
> +  tree y = create_tmp_var_raw (stype, "y");
> +  tree z = create_tmp_var_raw (stype, "z");
> +  tree four = build_int_cst (stype, 4);
> +  tree lhs1 = fold_build2 (PLUS_EXPR, stype, x, y);
> +  tree rhs1 = fold_convert (stype,
> +               fold_build2 (PLUS_EXPR, utype,
> +                            fold_convert (utype, x),
> +                            fold_convert (utype, y)));
> +
> +  /* (int)((unsigned x) + (unsigned y)) == x + y.  */
> +  ASSERT_TRUE (operand_equal_p (lhs1, rhs1, OEP_STRUCTURAL_EQ));
> +  ASSERT_FALSE (operand_equal_p (lhs1, rhs1, 0));
> +
> +  /* (int)(unsigned) x == x.  */
> +  tree lhs2 = build1 (NOP_EXPR, stype,
> +               build1 (NOP_EXPR, utype, x));
> +  tree rhs2 = x;
> +  ASSERT_TRUE (operand_equal_p (lhs2, rhs2, OEP_STRUCTURAL_EQ));
> +  ASSERT_TRUE (operand_equal_p (lhs2, rhs2, 0));

That's not too useful - we'd never see this due to folding.

> +
> +  /* (unsigned x) + (unsigned y) == x + y.  */
> +  tree lhs3 = lhs1;
> +  tree rhs3 = fold_build2 (PLUS_EXPR, utype,
> +                          fold_convert (utype, x),
> +                          fold_convert (utype, y));
> +  ASSERT_TRUE (operand_equal_p (lhs3, rhs3, OEP_STRUCTURAL_EQ));
> +  ASSERT_FALSE (operand_equal_p (lhs3, rhs3, 0));
> +
> +  /* (unsigned x) / (unsigned y) == x / y.  */
> +  tree lhs4 = fold_build2 (TRUNC_DIV_EXPR, stype, x, y);;
> +  tree rhs4 = fold_build2 (TRUNC_DIV_EXPR, utype,
> +                          fold_convert (utype, x),
> +                          fold_convert (utype, y));
> +  ASSERT_FALSE (operand_equal_p (lhs4, rhs4, OEP_STRUCTURAL_EQ));
> +  ASSERT_FALSE (operand_equal_p (lhs4, rhs4, 0));
> +
> +  /* (unsigned x) / 4 == x / 4.  */
> +  tree lhs5 = fold_build2 (TRUNC_DIV_EXPR, stype, x, four);;
> +  tree rhs5 = fold_build2 (TRUNC_DIV_EXPR, utype,
> +                          fold_convert (utype, x),
> +                          fold_convert (utype, four));
> +  ASSERT_FALSE (operand_equal_p (lhs5, rhs5, OEP_STRUCTURAL_EQ));
> +  ASSERT_FALSE (operand_equal_p (lhs5, rhs5, 0));
> +
> +  /* (unsigned x) + 4 == x + 4.  */
> +  tree lhs6 = fold_build2 (PLUS_EXPR, stype, x, four);
> +  tree rhs6 = fold_build2 (PLUS_EXPR, utype,
> +                          fold_convert (utype, x),
> +                          fold_convert (utype, four));
> +  ASSERT_TRUE (operand_equal_p (lhs6, rhs6, OEP_STRUCTURAL_EQ));
> +  ASSERT_FALSE (operand_equal_p (lhs6, rhs6, 0));
> +
> +  /* (unsigned x) + 4 == 4 + x.  */
> +  tree lhs7 = fold_build2 (PLUS_EXPR, stype, four, x);
> +  tree rhs7 = fold_build2 (PLUS_EXPR, utype,
> +                          fold_convert (utype, x),
> +                          fold_convert (utype, four));
> +  ASSERT_TRUE (operand_equal_p (lhs7, rhs7, OEP_STRUCTURAL_EQ));
> +  ASSERT_FALSE (operand_equal_p (lhs7, rhs7, 0));
> +
> +  /* ((unsigned x) + 4) * (unsigned y)) + z == ((4 + x) * y) + z.  */
> +  tree lhs8 = fold_build2 (PLUS_EXPR, stype,
> +                          fold_build2 (MULT_EXPR, stype,
> +                                       fold_build2 (PLUS_EXPR, stype, four, 
> x),
> +                                       y),
> +                          z);
> +  tree rhs8 = fold_build2 (MULT_EXPR, utype,
> +                          fold_build2 (PLUS_EXPR, utype,
> +                                       fold_convert (utype, x),
> +                                       fold_convert (utype, four)),
> +                          fold_convert (utype, y));
> +  rhs8 = fold_build2 (PLUS_EXPR, stype, fold_convert (stype, rhs8), z);
> +  ASSERT_TRUE (operand_equal_p (lhs8, rhs8, OEP_STRUCTURAL_EQ));
> +  ASSERT_FALSE (operand_equal_p (lhs8, rhs8, 0));
> +}
> +}
> +
>  namespace test_fold_vec_perm_cst {
>
>  /* Build a VECTOR_CST corresponding to VMODE, and has
> @@ -18150,6 +18267,7 @@ fold_const_cc_tests ()
>    test_vector_folding ();
>    test_vec_duplicate_folding ();
>    test_fold_vec_perm_cst::test ();
> +  test_operand_equality::test ();
>  }
>
>  } // namespace selftest
> diff --git a/gcc/testsuite/gfortran.dg/addressing-modes_2.f90 
> b/gcc/testsuite/gfortran.dg/addressing-modes_2.f90
> new file mode 100644
> index 
> 0000000000000000000000000000000000000000..8eee4be3dc4d69fecfacd4c2e24a4973c8539fae
> --- /dev/null
> +++ b/gcc/testsuite/gfortran.dg/addressing-modes_2.f90
> @@ -0,0 +1,20 @@
> +! { dg-do compile { target aarch64-*-* } }
> +! { dg-additional-options "-w -Ofast -fdump-tree-ivopts-all" }
> +
> +module a
> +integer, parameter :: b=3, c=b
> +contains
> +subroutine d(block)
> +integer f, col   , block(:, :, :), e
> +do f = 1, c
> +   do col = 1, c
> +             block(:f,                          :, e()) = do
> +     end do
> +  end do
> +  end
> +  end
> +
> +! { dg-final { scan-tree-dump-not {Selected IV set for loop .+ niters, 3 
> IVs:} ivopts } }
> +! { dg-final { scan-tree-dump-times {Selected IV set for loop .+ niters, 2 
> IVs:} 1 ivopts } }
> +! { dg-final { scan-tree-dump-times {Selected IV set for loop .+ niters, 1 
> IVs:} 1 ivopts } }
> +
> diff --git a/gcc/tree-core.h b/gcc/tree-core.h
> index 
> 27c569c77022227643151fa4a909a95f3d45bf55..fe398d99bcaa9e8fcec45e5e07d133bc393aa4a7
>  100644
> --- a/gcc/tree-core.h
> +++ b/gcc/tree-core.h
> @@ -962,7 +962,9 @@ enum operand_equal_flag {
>    /* In conjunction with OEP_LEXICOGRAPHIC considers names of declarations
>       of the same kind.  Used to compare VLA bounds involving parameters
>       across redeclarations of the same function.  */
> -  OEP_DECL_NAME = 512
> +  OEP_DECL_NAME = 512,
> +  /* Check for structural equivalence and ignore NOP_CONVERT casts.  */

This should better document the assumptions - it treats expressions producing
the same bit value even though they differ in sign the same(?), and it ignores
differences in UB(?).

OEP_BITWISE is a bit of an odd thing here, it seems to require type
compatibility
for constants in particular - something operand_equal_p usually doesn't.  But
while ignoring sign-conversions there it doesn't elsewhere (it
presents itself as
stricter than the default).  I'd have called the new mode OEP_BITWISE if that
were not already taken ...

> +  OEP_STRUCTURAL_EQ = 1024
>  };
>
>  /* Enum and arrays used for tree allocation stats.
> diff --git a/gcc/tree-ssa-loop-ivopts.cc b/gcc/tree-ssa-loop-ivopts.cc
> index 
> c3218a3e8eedbb8d0a7f14c01eeb069cb6024c29..b05e4ac1e63b5c110707a8a3ed5e614b7ccc702f
>  100644
> --- a/gcc/tree-ssa-loop-ivopts.cc
> +++ b/gcc/tree-ssa-loop-ivopts.cc
> @@ -1623,8 +1623,8 @@ record_group_use (struct ivopts_data *data, tree *use_p,
>
>           /* Check if it has the same stripped base and step.  */
>           if (operand_equal_p (iv->base_object, use->iv->base_object, 0)
> -             && operand_equal_p (iv->step, use->iv->step, 0)
> -             && operand_equal_p (addr_base, use->addr_base, 0))
> +             && operand_equal_p (iv->step, use->iv->step, OEP_STRUCTURAL_EQ)
> +             && operand_equal_p (addr_base, use->addr_base, 
> OEP_STRUCTURAL_EQ))
>             break;
>         }
>        if (i == data->vgroups.length ())
>
>
>
>
> --

Reply via email to