RE: [PATCH][ivopts]: use affine_tree when comparing IVs during candidate selection [PR114932]

Tamar Christina Mon, 24 Jun 2024 06:31:16 -0700

> -----Original Message-----
> From: Richard Biener <rguent...@suse.de>
> Sent: Thursday, June 20, 2024 8:49 AM
> To: Tamar Christina <tamar.christ...@arm.com>
> Cc: gcc-patches@gcc.gnu.org; nd <n...@arm.com>; bin.ch...@linux.alibaba.com
> Subject: RE: [PATCH][ivopts]: use affine_tree when comparing IVs during 
> candidate
> selection [PR114932]
> 
> On Wed, 19 Jun 2024, Tamar Christina wrote:
> 
> > > -----Original Message-----
> > > From: Richard Biener <rguent...@suse.de>
> > > Sent: Wednesday, June 19, 2024 12:55 PM
> > > To: Tamar Christina <tamar.christ...@arm.com>
> > > Cc: gcc-patches@gcc.gnu.org; nd <n...@arm.com>;
> bin.ch...@linux.alibaba.com
> > > Subject: Re: [PATCH][ivopts]: use affine_tree when comparing IVs during
> candidate
> > > selection [PR114932]
> > >
> > > On Fri, 14 Jun 2024, Tamar Christina wrote:
> > >
> > > > Hi All,
> > > >
> > > > IVOPTS normally uses affine trees to perform comparisons between 
> > > > different
> IVs,
> > > > but these seem to have been missing in two key spots and instead normal
> tree
> > > > equivalencies used.
> > > >
> > > > In some cases where we have a structural equivalence but not a 
> > > > signedness
> > > > equivalencies we end up generating both a signed and unsigned IV for the
> same
> > > > candidate.
> > > >
> > > > This happens quite a lot with fortran but can also happen in C because 
> > > > this
> came
> > > > code is unable to figure out when one expression is a multiple of 
> > > > another.
> > > >
> > > > As an example in the attached testcase we get:
> > > >
> > > > Initial set of candidates:
> > > >   cost: 24 (complexity 3)
> > > >   reg_cost: 9
> > > >   cand_cost: 15
> > > >   cand_group_cost: 0 (complexity 3)
> > > >   candidates: 1, 6, 8
> > > >    group:0 --> iv_cand:6, cost=(0,1)
> > > >    group:1 --> iv_cand:1, cost=(0,0)
> > > >    group:2 --> iv_cand:8, cost=(0,1)
> > > >    group:3 --> iv_cand:8, cost=(0,1)
> > > >   invariant variables: 6
> > > >   invariant expressions: 1, 2
> > > >
> > > > <Invariant Expressions>:
> > > > inv_expr 1:     stride.3_27 * 4
> > > > inv_expr 2:     (unsigned long) stride.3_27 * 4
> > > >
> > > > These end up being used in the same group:
> > > >
> > > > Group 1:
> > > > cand  cost    compl.  inv.expr.       inv.vars
> > > > 1     0       0       NIL;    6
> > > > 2     0       0       NIL;    6
> > > > 3     0       0       NIL;    6
> > > >
> > > > which ends up with IV opts picking the signed and unsigned IVs:
> > > >
> > > > Improved to:
> > > >   cost: 24 (complexity 3)
> > > >   reg_cost: 9
> > > >   cand_cost: 15
> > > >   cand_group_cost: 0 (complexity 3)
> > > >   candidates: 1, 6, 8
> > > >    group:0 --> iv_cand:6, cost=(0,1)
> > > >    group:1 --> iv_cand:1, cost=(0,0)
> > > >    group:2 --> iv_cand:8, cost=(0,1)
> > > >    group:3 --> iv_cand:8, cost=(0,1)
> > > >   invariant variables: 6
> > > >   invariant expressions: 1, 2
> > > >
> > > > and so generates the same IV as both signed and unsigned:
> > > >
> > > > ;;   basic block 21, loop depth 3, count 214748368 (estimated locally, 
> > > > freq
> > > 58.2545), maybe hot
> > > > ;;    prev block 28, next block 31, flags: (NEW, REACHABLE, VISITED)
> > > > ;;    pred:       28 [always]  count:23622320 (estimated locally, freq 
> > > > 6.4080)
> > > (FALLTHRU,EXECUTABLE)
> > > > ;;                25 [always]  count:191126046 (estimated locally, freq 
> > > > 51.8465)
> > > (FALLTHRU,DFS_BACK,EXECUTABLE)
> > > >   # .MEM_66 = PHI <.MEM_34(28), .MEM_22(25)>
> > > >   # ivtmp.22_41 = PHI <0(28), ivtmp.22_82(25)>
> > > >   # ivtmp.26_51 = PHI <ivtmp.26_55(28), ivtmp.26_72(25)>
> > > >   # ivtmp.28_90 = PHI <ivtmp.28_99(28), ivtmp.28_98(25)>
> > > >
> > > > ...
> > > >
> > > > ;;   basic block 24, loop depth 3, count 214748366 (estimated locally, 
> > > > freq
> > > 58.2545), maybe hot
> > > > ;;    prev block 22, next block 25, flags: (NEW, REACHABLE, VISITED)'
> > > > ;;    pred:       22 [always]  count:95443719 (estimated locally, freq 
> > > > 25.8909)
> > > (FALLTHRU)
> > > ;;                21 [33.3% (guessed)]  count:71582790 (estimated 
> > > locally, freq
> 19.4182)
> > > (TRUE_VALUE,EXECUTABLE)
> > > ;;                31 [33.3% (guessed)]  count:47721860 (estimated 
> > > locally, freq
> 12.9455)
> > > (TRUE_VALUE,EXECUTABLE)
> > > # .MEM_22 = PHI <.MEM_44(22), .MEM_31(21), .MEM_79(31)>
> > >
> > >                                                                           
> > >          ivtmp.22_82 = ivtmp.22_41 + 1;
> > > ivtmp.26_72 = ivtmp.26_51 + _80;
> > > ivtmp.28_98 = ivtmp.28_90 + _39;
> > > >
> > > > These two IVs are always used as unsigned, so IV ops generates:
> > > >
> > > >   _73 = stride.3_27 * 4;
> > > >   _80 = (unsigned long) _73;
> > > >   _54 = (unsigned long) stride.3_27;
> > > >   _39 = _54 * 4;
> > > >
> > > > Which means that in e.g. exchange2 we generate a lot of duplicate code.
> > > >
> > > > This is because candidate 6 and 8 are structurally equivalent but have
> different
> > > > signs.
> > > >
> > > > This patch changes it so that if you have two IVs that are affine 
> > > > equivalent to
> > > > just pick one over the other.  IV already has code for this, so the 
> > > > patch just
> > > > uses affine trees instead of tree for the check.
> > > >
> > > > With it we get:
> > > >
> > > > <Invariant Expressions>:
> > > > inv_expr 1:     stride.3_27 * 4
> > > >
> > > > <Group-candidate Costs>:
> > > > Group 0:
> > > >   cand  cost    compl.  inv.expr.       inv.vars
> > > >   5     0       2       NIL;    NIL;
> > > >   6     0       3       NIL;    NIL;
> > > >
> > > > Group 1:
> > > >   cand  cost    compl.  inv.expr.       inv.vars
> > > >   1     0       0       NIL;    6
> > > >   2     0       0       NIL;    6
> > > >   3     0       0       NIL;    6
> > > >   4     0       0       NIL;    6
> > > >
> > > > Initial set of candidates:
> > > >   cost: 16 (complexity 3)
> > > >   reg_cost: 6
> > > >   cand_cost: 10
> > > >   cand_group_cost: 0 (complexity 3)
> > > >   candidates: 1, 6
> > > >    group:0 --> iv_cand:6, cost=(0,3)
> > > >    group:1 --> iv_cand:1, cost=(0,0)
> > > >   invariant variables: 6
> > > >   invariant expressions: 1
> > > >
> > > > The two patches together results in a 10% performance increase in 
> > > > exchange2
> in
> > > > SPECCPU 2017 and a 4% reduction in binary size and a 5% improvement in
> > > compile
> > > > time. There's also a 5% performance improvement in fotonik3d and similar
> > > > reduction in binary size.
> > > >
> > > > Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.
> > > >
> > > > Ok for master?
> > > >
> > > > Thanks,
> > > > Tamar
> > > >
> > > > gcc/ChangeLog:
> > > >
> > > >         PR tree-optimization/114932
> > > >         * tree-affine.cc (aff_combination_constant_multiple_p): Take 
> > > > zero
> > > >         offsets into account.
> > > >         * tree-ssa-loop-ivopts.cc (affine_compare_eq): New.
> > > >         (record_group_use): Use it.
> > > >         (constant_multiple_of): Also check equality under
> > > >         aff_combination_constant_multiple_p.
> > > >
> > > > gcc/testsuite/ChangeLog:
> > > >
> > > >         PR tree-optimization/114932
> > > >         * gfortran.dg/addressing-modes_2.f90: New test.
> > > >
> > > > ---
> > > > diff --git a/gcc/testsuite/gfortran.dg/addressing-modes_2.f90
> > > b/gcc/testsuite/gfortran.dg/addressing-modes_2.f90
> > > > new file mode 100644
> > > > index
> > >
> 0000000000000000000000000000000000000000..8eee4be3dc4d69fecfacd4c
> > > 2e24a4973c8539fae
> > > > --- /dev/null
> > > > +++ b/gcc/testsuite/gfortran.dg/addressing-modes_2.f90
> > > > @@ -0,0 +1,20 @@
> > > > +! { dg-do compile { target aarch64-*-* } }
> > > > +! { dg-additional-options "-w -Ofast -fdump-tree-ivopts-all" }
> > > > +
> > > > +module a
> > > > +integer, parameter :: b=3, c=b
> > > > +contains
> > > > +subroutine d(block)
> > > > +integer f, col   , block(:, :, :), e
> > > > +do f = 1, c
> > > > +   do col = 1, c
> > > > +             block(:f,                          :, e()) = do
> > > > +     end do
> > > > +  end do
> > > > +  end
> > > > +  end
> > > > +
> > > > +! { dg-final { scan-tree-dump-not {Selected IV set for loop .+ niters, 
> > > > 3 IVs:}
> ivopts
> > > } }
> > > > +! { dg-final { scan-tree-dump-times {Selected IV set for loop .+ 
> > > > niters, 2 IVs:}
> 1
> > > ivopts } }
> > > > +! { dg-final { scan-tree-dump-times {Selected IV set for loop .+ 
> > > > niters, 1 IVs:}
> 1
> > > ivopts } }
> > > > +
> > > > diff --git a/gcc/tree-affine.cc b/gcc/tree-affine.cc
> > > > index
> > >
> d6309c4390362b680f0aa97a41fac3281ade66fd..b141bf23c1bbea1001b1bb28
> > > 6346890ddeab4096 100644
> > > > --- a/gcc/tree-affine.cc
> > > > +++ b/gcc/tree-affine.cc
> > > > @@ -941,6 +941,13 @@ aff_combination_constant_multiple_p (aff_tree
> *val,
> > > aff_tree *div,
> > > >                                      &mult_set, mult))
> > > >      return false;
> > > >
> > > > +  /* Everything is a multiple of 0, which means we shoudn't enforce 
> > > > that
> > > > +     mult_set is checked, since that forced the only valid multiple of
> > > > +     val and div to be 0 whereas 1 is also possible.  */
> > > > +  if (known_eq (val->offset, 0)
> > > > +      && known_eq (div->offset, 0))
> > > > +    mult_set = false;
> > > > +
> > >
> > > In fact all numbers are possible?  Shouldn't this be better handled
> > > in wide_int_constant_multiple_p by special-casing
> > > known_eq (div, 0) in the known_eq (val, 0) condition by simply
> > > returning 'true' without checking or setting *mult_set?
> > >
> > > The docs of wide_int_constant_multiple_p is odd:
> > >
> > > /* If VAL != CST * DIV for any constant CST, returns false.
> > >
> > > should that be 'If VAL == CST * DIV for no constant CST, returns false.'?
> > > Or s/any/all/?
> > >
> >
> > Yeah, that condition would always be false.
> 
> Can you split out a patch to fix this?

Doing now.

> 
> > > >    for (i = 0; i < div->n; i++)
> > > >      {
> > > >        class aff_comb_elt *elt
> > > > diff --git a/gcc/tree-ssa-loop-ivopts.cc b/gcc/tree-ssa-loop-ivopts.cc
> > > > index
> > >
> 7a277aaf18a9e0a32b8ac0d23332b7cd9945ef98..b11bd62a86092ba972a6487
> > > 64cd2facd9ddb4914 100644
> > > > --- a/gcc/tree-ssa-loop-ivopts.cc
> > > > +++ b/gcc/tree-ssa-loop-ivopts.cc
> > > > @@ -757,6 +757,19 @@ single_dom_exit (class loop *loop)
> > > >    return exit;
> > > >  }
> > > >
> > > > +/* Compares the given affine tree LEFT with the tree expression RIGHT 
> > > > and
> > > return
> > > > +   whether they are the same under affine equality.  */
> > > > +
> > > > +static bool
> > > > +affine_compare_eq (aff_tree &left, tree right)
> > > > +{
> > > > +  aff_tree aff_right;
> > > > +  tree_to_aff_combination (right, TREE_TYPE (right), &aff_right);
> > > > +  aff_combination_scale (&aff_right, -1);
> > > > +  aff_combination_add (&aff_right, &left);
> > > > +  return aff_combination_zero_p (&aff_right);
> > > > +}
> > > > +
> > > >
> > > >  /* Given a nested expression in ARG consisting of PLUS or MULT try to 
> > > > see if
> one
> > > >     of the arguments of each expression is a constant and that the type 
> > > > of the
> > > > @@ -1673,6 +1686,9 @@ record_group_use (struct ivopts_data *data, tree
> > > *use_p,
> > > >    tree addr_base = NULL;
> > > >    struct iv_group *group = NULL;
> > > >    poly_uint64 addr_offset = 0;
> > > > +  aff_tree iv_step, iv_addr_base;
> > > > +
> > > > +  tree_to_aff_combination (iv->step, TREE_TYPE (iv->step), &iv_step);
> > > >
> > > >    /* Record non address type use in a new group.  */
> > > >    if (address_p (type))
> > > > @@ -1683,6 +1699,7 @@ record_group_use (struct ivopts_data *data, tree
> > > *use_p,
> > > >        tree addr_toffset;
> > > >        split_constant_offset (iv->base, &addr_base, &addr_toffset);
> > > >        addr_offset = int_cst_value (addr_toffset);
> > > > +      tree_to_aff_combination (addr_base, TREE_TYPE (addr_base),
> > > &iv_addr_base);
> > > >        for (i = 0; i < data->vgroups.length (); i++)
> > > >         {
> > > >           struct iv_use *use;
> > > > @@ -1694,8 +1711,8 @@ record_group_use (struct ivopts_data *data, tree
> > > *use_p,
> > > >
> > > >           /* Check if it has the same stripped base and step.  */
> > > >           if (operand_equal_p (iv->base_object, use->iv->base_object, 0)
> > > > -             && operand_equal_p (iv->step, use->iv->step, 0)
> > > > -             && operand_equal_p (addr_base, use->addr_base, 0))
> > > > +             && affine_compare_eq (iv_step, use->iv->step)
> > > > +             && affine_compare_eq (iv_addr_base, use->addr_base))
> > >
> > > There's only this use of addr_base so I think the opportunity is to
> > > turn iv_use->addr_base into aff_tree (even though that's a quite big
> > > representation).
> >
> > Ah, true.  Will do.
> >
> > >
> > > For the testcase, what are the two IVs we are comparing?  I wonder
> > > why you need the affine compare for iv->step?
> > >
> >
> > Because step builds up the IV expressions and can also be an expression.
> >
> > In this case:
> >
> > >>> p debug (iv->step)
> > ((unsigned long) stride.3_27) * 4
> > >>> p debug (use->iv->step)
> > (sizetype) (stride.3_27 * 4)
> >
> > Note that the original expressions were both unsigned, but the STRIP_NOPS
> > made one signed.   They still wouldn't have compared equal though due to
> > the different cast locations.
> >
> > For completeness the base here is
> >
> > >>> p debug (use->addr_base)
> > (integer(kind=4) *) ((((((sizetype) prephitmp_64 - (sizetype) stride.3_27) -
> (sizetype) stride.5_29) + (sizetype) _7) + (sizetype) stride.3_27) * 4 + 
> (sizetype)
> block.0_26)
> > $9 = void
> >
> > >>> p debug (addr_base)
> > (integer(kind=4) *) ((((((sizetype) prephitmp_64 - (sizetype) stride.3_27) -
> (sizetype) stride.5_29) + (sizetype) _7) + (sizetype) stride.3_27) * 4 + 
> (sizetype)
> block.0_26)
> > $10 = void
> > >
> >
> > > >             break;
> > > >         }
> > > >        if (i == data->vgroups.length ())
> > > > @@ -2231,6 +2248,14 @@ constant_multiple_of (tree top, tree bot,
> widest_int
> > > *mul)
> > > >        return true;
> > > >      }
> > > >
> > > > +  aff_tree aff_top, aff_bot;
> > > > +  tree_to_aff_combination (top, TREE_TYPE (top), &aff_top);
> > > > +  tree_to_aff_combination (bot, TREE_TYPE (bot), &aff_bot);
> > > > +  poly_widest_int poly_mul;
> > > > +  if (aff_combination_constant_multiple_p (&aff_top, &aff_bot, 
> > > > &poly_mul)
> > > > +      && poly_mul.is_constant (mul))
> > > > +    return true;
> > > > +
> > >
> > > So why does stripping nops not work here?
> >
> > So this is where we compare different IV expressions to determine which
> > IVs compute the same thing and thus can be in the same group.
> >
> > The STRIP_NOPS don't work because while the incoming types are the same
> > the casts are different.  So:
> >
> > >>> p debug (ustep)
> > (unsigned long) stride.3_27 * 4
> > $3 = void
> > >>> p debug (cstep)
> > (unsigned long) (stride.3_27 * 4)
> > $4 = void
> >
> > Which is of course stripped to:
> >
> > >>> p debug (top)
> > (unsigned long) stride.3_27 * 4
> > $1 = void
> > >>> p debug (bot)
> > stride.3_27 * 4
> 
> So we're expecting constant_multiple_of to compute *mul == 1, proving
> equality?
> 

Indeed

> The function seems to try stripping ops from TOP until it reaches
> an expression equal to BOT and that's what fails to trigger here.
> 
> What I originally wondered was iff we compute the affine combinations
> why not use only aff_combination_constant_multiple_p?
> 

I think that's probably easier, The rest of the code seems to indeed be
repeating the work of aff_combination_constant_multiple_p.

I can try replacing the whole thing with aff_combination_constant_multiple_p
and see?

> I might also point back to the idea I threw in somewhere, adding
> OEP_VALUE (or a better name) to the set of flags accepted by
> operand_equal_p.  You mentioned hashing IIRC but I don't see the patches
> touching hashing?
> 

Yes, That can indeed be done with this approach.  The hashing was that before I
was trying to prevent the "duplicate" IV expressions from being created in the
first place by modifying get_loop_invariant_expr.

This function looks up if we have already seen a particular IV expression and if
we have it just returns that expression.  However after reading more of the code
I realized this wasn't the right approach, as without also dealing with the 
candidates
we'd end up creating IV expression that can't be handled by any candidate.

IVops would just give up then.   Reading the code it seems that 
get_loop_invariant_expr
is just there to prevent blatant duplicates.  i.e. it treats `(signed) a` and 
`a` as the same.

This is also why I think that everywhere else *has* to continue stripping the 
expression.

On a note from Richard S that he thought IVopts already had some code to deal 
with
expressions that differ only in signs led me to take a different approach.

The goal wasn't to remove the different sign/unsigned IV expressions, but 
instead get
Then to be servable by the same candidates. i.e. we want them in the same 
candidate
groups and then candidate optimization will just do its thing.

That seemed a more natural fit to how it worked.

Do you want me to try the operand_equal_p approach? Though in this case the 
issue
is we not only need to know they're equal, but also need to know the scale 
factor.

get_computation_aff_1 scales the common type IV by the scale we determined,
so I think operand_equal_p would not be very useful here.  But it does look like
constant_multiple_of can just be implemented with 
aff_combination_constant_multiple_p.

Should I try?

Thanks,
Tamar


> > Both of these compute the same thing so by doing the affine compare they
> > end up in the same IV groups and can be costed.  Later during candidate 
> > selection
> > these are the steps we're comparing to see if the candidate is the same 
> > invariant.
> >
> > The full list is:
> >
> > <IV Groups>:
> > Group 0:
> >   Type: REFERENCE ADDRESS
> >   Use 0.0:
> >     At stmt:    *_4 = _33;
> >     At pos:     *_4
> >     IV struct:
> >       Type:     integer(kind=4) *
> >       Base:     (integer(kind=4) *) ((((unsigned long) stride.3_27 + 
> > (unsigned long)
> _36) * 4 + (unsigned long) block.0_26) + 4)
> >       Step:     (sizetype) (stride.3_27 * 4)
> >       Object:   (void *) block.0_26
> >       Biv:      N
> >       Overflowness wrto loop niter:     Overflow
> >   Use 0.1:
> >     At stmt:    *_78 = _33;
> >     At pos:     *_78
> >     IV struct:
> >       Type:     integer(kind=4) *
> >       Base:     (integer(kind=4) *) ((((unsigned long) stride.3_27 + 
> > (unsigned long)
> _36) * 4 + (unsigned long) block.0_26) + 8)
> >       Step:     (unsigned long) stride.3_27 * 4
> >       Object:   (void *) block.0_26
> >       Biv:      N
> >       Overflowness wrto loop niter:     Overflow
> >   Use 0.2:
> >     At stmt:    *_45 = _33;
> >     At pos:     *_45
> >     IV struct:
> >       Type:     integer(kind=4) *
> >       Base:     (integer(kind=4) *) ((((unsigned long) stride.3_27 + 
> > (unsigned long)
> _36) * 4 + (unsigned long) block.0_26) + 12)
> >       Step:     (unsigned long) stride.3_27 * 4
> >       Object:   (void *) block.0_26
> >       Biv:      N
> >       Overflowness wrto loop niter:     Overflow
> >
> > And all these IVs are the same but with a slightly different base offset.  
> > By doing
> the affine compare here IV ops sees that
> > The best candidate is:
> >
> > Candidate 6:
> >   Depend on inv.exprs: 1
> >   Var befor: ivtmp.26_51
> >   Var after: ivtmp.26_72
> >   Incr POS: before exit test
> >   IV struct:
> >     Type:       unsigned long
> >     Base:       ((unsigned long) stride.3_27 + (unsigned long) _36) * 4 + 
> > (unsigned
> long) block.0_26
> >     Step:       (unsigned long) (stride.3_27 * 4)
> >     Object:     (void *) block.0_26
> >     Biv:        N
> >     Overflowness wrto loop niter:       Overflow
> >
> > And just builds the three IVs from the same candidate.
> >
> > Does this cover what you wanted?
> >
> > Cheers,
> > Tamar
> >
> > >
> > > >    code = TREE_CODE (top);
> > > >    switch (code)
> > > >      {
> > > >
> > > >
> > > >
> > > >
> > > >
> > >
> > > --
> > > Richard Biener <rguent...@suse.de>
> > > SUSE Software Solutions Germany GmbH,
> > > Frankenstrasse 146, 90461 Nuernberg, Germany;
> > > GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG
> Nuernberg)
> >
> 
> --
> Richard Biener <rguent...@suse.de>
> SUSE Software Solutions Germany GmbH,
> Frankenstrasse 146, 90461 Nuernberg, Germany;
> GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)
RE: [PATCH][ivopts]: use affine_tree when comparing IVs during candidate selection [PR114932]

Reply via email to