On Wed, Dec 18, 2024 at 2:55 PM Aleksandar Rakic
<aleksandar.ra...@htecgroup.com> wrote:
>
> Hi,
>
> > >
> > > Hi,
> > >
> > > I think I managed to fix indentation from the previous version.
> > >
> > > When comparing the tables showing the candidates for the group 1
> > > before and after applying this patch, it can be observed that
> > > complexities for the candidates where the computation depends on the
> > > invariant expressions or the invariant variables should be at least
> > > one, which aligns with the approach used in the commit c2b64ce.
> >
> > Commit c2b64ce is
> >
> > +2017-05-11  Bin Cheng  <bin.ch...@arm.com>
> > +
> > +     * tree-ssa-address.c (struct mem_address): Move to header file.
> > +     (valid_mem_ref_p, move_fixed_address_to_symbol): Make it global.
> > +     * tree-ssa-address.h (struct mem_address): Move from C file.
> > +     (valid_mem_ref_p, move_fixed_address_to_symbol): Declare.
> >
> > that hasn't any "approach" to complexity, it just makes a function
> > global.
>
> In the commit c2b64ce, which is the parent of the commit f9f69dd, which
> introduced the bug, in the function get_address_cost the complexity is
> incremented by one if var_present != 0, so that's the reason why I've
> mentioned that commit.

OK, so that was confusing - you should have mentioned the commit
_changing_ the behavior.  I see that change is very large though.

> > >
> > > ===== Before this patch =====
> > > Group 1:
> > >   cand  cost    compl.  inv.expr.       inv.vars
> > >   1     11      0       5;      NIL;
> > >   2     11      0       6;      NIL;
> > >   4     8       0       7;      NIL;
> > >   5     9       0       8;      NIL;
> > >   6     1       0       NIL;    NIL;
> > >   7     1       1       NIL;    NIL;
> > >   9     7       0       5;      NIL;
> > > ===== Before this patch =====
> > > ===== After this patch =====
> > > Group 1:
> > >   cand  cost    compl.  inv.expr.       inv.vars
> > >   1     11      2       4;      NIL;
> >
> > why does complexity go up to 2 from 0 here?
>
> The complexity for certain candidates equals 2 in the figure because two
> conditions are met:
> 1) parts.offset != NULL_TREE && !integer_zerop (parts.offset),
> 2) (inv_vars && *inv_vars && !bitmap_empty_p (*inv_vars)) ||
> (inv_expr && *inv_expr != NULL).
>
> > >   2     11      1       4;      NIL;
> > >   4     8       1       5;      NIL;
> > >   5     8       2       6;      NIL;
> >
> > Likewise, and why does cost change?
> >
> > >   6     1       0       NIL;    NIL;
> > >   7     1       1       NIL;    NIL;
> > >   9     7       2       4;      NIL;
> >
> > Likewise.
>
> The cost for a candidate in the figure changes because parts.offset and
> aff_inv->offset change, which lead to different cost.
>
> > This comparison is probably not very useful without showing the actual
> > candidate and its uses?
>
> The candidates and the use are attached. For example, the candidate 1
> has a base 0 and a step 1. Use 1 has a base vector_27(D) +
> ((sizetype) _11 + 1) * 4 and a step 4. Use can be presented as ubase +
> ratio * (candidate - cbase), where ratio is ustep/cstep. It equals
> vector_27(D) + ((sizetype) _11 + 1) * 4 + 4 * candidate1, ie. aff_inv
> equals vector_27(D) + (sizetype) _11 * 4 + 4 and aff_var equals
> 4 * candidate1. Then, parts.offset becomes 4 (from aff_inv) and the
> invariant expression 4 equals vector_27(D) + (sizetype) _11 * 4.
>
> > The above before/after figures do not match the testcase
> > ontop of trunk.
> >
> > > ===== After this patch =====
> > >
> > > Hence, if the invariant expressions or the invariant variables are
> > > used when representing use with candidate, the complexity should be
> > > larger for more complex expressions, so it is incremented by one. I
> > > am not sure whether inv_present could be expressed as parts.
> >
> > The testcase looks mips specific - it has just scan-tree-dump-not
> > which is probably easily confused to PASS when it regressed. Can you
> > instead add a gcc.target/mips/ testcase that scans for actual
> > assembler features? If the testcase relies on inlining daxpy then
> > declaring that static helps that you just get dgefa in the final
> > assembly. If you want to scan both functions I suggest to split the
> > testcase into two to make it more reliable.
>
> I positioned on the commit a4954130d4 (c++: define __cpp_pack_indexing
> [PR113798]), applied the patch, and got the same before/after figures.
> Also, both scan-tree-dump-not tests passed. I declared daxpy static in
> the new testcase. I am not sure what features to scan in assembly.

A scan-tree-dump-not easily passes (as said, it passes before/after the
patch for me).

> > I see r15-5650-gd9c908b7503965 for a --target=mips64-r6-linux-gnu
> > generates for the innermost loop of dgefa
> >
> > .L12:
> >         addu    $3,$9,$2
> >         addu    $3,$3,$8
> >         lwc1    $f1,0($3)
> >         lwc1    $f0,0($2)
> >         addiu   $7,$7,1
> >         mul.s   $f1,$f2,$f1
> >         addiu   $2,$2,4
> >         slt     $3,$7,$10
> >         add.s   $f0,$f0,$f1
> >         .set    noreorder
> >         .set    nomacro
> >         bne     $3,$0,.L12
> >
> > and with the patch
> >
> > .L12:
> >         addu    $3,$9,$2
> >         addu    $3,$3,$8
> >         lwc1    $f1,0($3)
> >         lwc1    $f0,0($2)
> >         addiu   $7,$7,1
> >         mul.s   $f1,$f2,$f1
> >         addiu   $2,$2,4
> >         slt     $3,$7,$10
> >         add.s   $f0,$f0,$f1
> >         .set    noreorder
> >         .set    nomacro
> >         bne     $3,$0,.L12
> >
> > that's suspiciously identical?! In fact the whole testcase generates
> > identical code.
>
> Could you please elaborate what do you mean by r15-5650-gd9c908b7503965?

That's the revision as gcc-descr encodes it, git hash d9c908b750 which is
a few days earlier as your one above.

> We managed to generate different assembly code with a patch. We used the
> following command to achieve that:
>
> $PREFIX/bin/mips64-r6-linux-gnu-gcc test.c -O2 \
> -fdump-tree-ivopts-details -S -o test.s,
>
> where $PREFIX is an install directory.
>
> Could you please share the command which you used to generate identical
> code? As I mentioned, we positioned on the commit a4954130d4 and applied
> the patch.

Same as yours, I mentioned the GCC configuration which might be the difference.
Note I've not installed the mips cross but only built all-gcc and
invoked gcc/cc1
directly.

> > So besides not being able to see the actual problem (maybe I need some
> > -march/-mtune?) the actual issue I have with the patch is that aff_inv
> > is tried to be distributed to other components and for parts that fail
> > to be distributed we cost it via
> >
> >   if (comp_inv != NULL_TREE)
> >     cost = force_var_cost (data, comp_inv, inv_vars);
> >
> > simply ensuring the complexity is at least one would have been to
> > change that to
> >
> >   if (comp_inv != NULL_TREE)
> >     {
> >       cost = force_var_cost (data, comp_inv, inv_vars);
> >       /* Ensure complexity is at least one.  */
> >       cost.complexity = MAX (1, cost.complexity);
> >     }
> >
> > or alternatively just do that for the if (comp_inv && inv_expr &&
> > !simple_inv) case
>
> The patch you posted related to complexity seems to be OK, but I would
> slightly modify it as follows:
>
>    if (comp_inv != NULL_TREE)
>      {
>        cost = force_var_cost (data, comp_inv, inv_vars);
>        cost.complexity += 1;
>      }
>
> I said 'the complexity should be at least one' because it was zero in
> the before figure, but it is just one special case.
>
> > (it's a bit odd we adjust cost there only for 'inv_expr != NULL').
>
> Could you please elaborate this sentence? My condition for complexity
> incrementing was:
>
> if ((inv_vars && *inv_vars && !bitmap_empty_p (*inv_vars)) || \
> (inv_expr && *inv_expr != NULL)).
>
> > The patch you posted instead of just adjusting complexity seems to
> > change the way we distribute the invariant - in particular we now
> > distribute it to parts.offset even when that is not supported
> > (!(ok_with_ratio_p || ok_without_ratio_p)), that's an odd change.
>
> Could you please elaborate this because I think I don't understand you.
> The condition !(ok_with_ratio_p || ok_without_ratio_p) checks whether
> base + index [<< scale] is supported or not. It doesn't check whether
> base + offset is supported. For example, the code on the commit
> a4954130d4 at line 4767 of tree-ssa-loop-ivopts.cc checks addressing
> mode base + offset even if base + index is not checked. I think that
> base + index is not required condition for base + offset.

It looks like an unrelated change to me.  If you are happy with

>    if (comp_inv != NULL_TREE)
>      {
>        cost = force_var_cost (data, comp_inv, inv_vars);
>        cost.complexity += 1;
>      }

can you please give that full testing and re-post such patch?  I'd also
appreciate if you'd move the testcase to be in gcc.target/mips, scanning
for expected assembly, rather than scanning for unexpected things not
appearing.

Please also adjust the commit message to indicate the revision
introducing the regression rather than using the one right before.

Richard,

> Kind regards,
> Aleksandar
>
> ________________________________________
> From: Richard Biener <richard.guent...@gmail.com>
> Sent: Thursday, November 28, 2024 1:45 PM
> To: Aleksandar Rakic
> Cc: gcc-patches@gcc.gnu.org; Djordje Todorovic; Jovan Dmitrovic
> Subject: Re: [Bug tree-optimization/109429] [PATCH v2] ivopts: fixed 
> complexities
>
>
>
> On Wed, Sep 25, 2024 at 5:32 PM Aleksandar Rakic
> <aleksandar.ra...@htecgroup.com> wrote:
> >
> > Hi,
> >
> > I think I managed to fix indentation from the previous version.
> >
> > When comparing the tables showing the candidates for the group 1 before
> > and after applying this patch, it can be observed that complexities for
> > the candidates where the computation depends on the invariant
> > expressions or the invariant variables should be at least one, which
> > aligns with the approach used in the commit c2b64ce.
>
> Commit c2b64ce is
>
> +2017-05-11  Bin Cheng  <bin.ch...@arm.com>
> +
> +       * tree-ssa-address.c (struct mem_address): Move to header file.
> +       (valid_mem_ref_p, move_fixed_address_to_symbol): Make it global.
> +       * tree-ssa-address.h (struct mem_address): Move from C file.
> +       (valid_mem_ref_p, move_fixed_address_to_symbol): Declare.
>
> that hasn't any "approach" to complexity, it just makes a function global.
>
> >
> > ===== Before this patch =====
> > Group 1:
> >   cand  cost    compl.  inv.expr.       inv.vars
> >   1     11      0       5;      NIL;
> >   2     11      0       6;      NIL;
> >   4     8       0       7;      NIL;
> >   5     9       0       8;      NIL;
> >   6     1       0       NIL;    NIL;
> >   7     1       1       NIL;    NIL;
> >   9     7       0       5;      NIL;
> > ===== Before this patch =====
> > ===== After this patch =====
> > Group 1:
> >   cand  cost    compl.  inv.expr.       inv.vars
> >   1     11      2       4;      NIL;
>
> why does complexity go up to 2 from 0 here?
>
> >   2     11      1       4;      NIL;
> >   4     8       1       5;      NIL;
> >   5     8       2       6;      NIL;
>
> Likewise, and why does cost change?
>
> >   6     1       0       NIL;    NIL;
> >   7     1       1       NIL;    NIL;
> >   9     7       2       4;      NIL;
>
> Likewise.
>
> This comparison is probably not very useful without showing the actual 
> candidate
> and its uses?  The above before/after figures do not match the testcase
> ontop of trunk.
>
> > ===== After this patch =====
> >
> > Hence, if the invariant expressions or the invariant variables are used
> > when representing use with candidate, the complexity should be larger
> > for more complex expressions, so it is incremented by one. I am not sure
> > whether inv_present could be expressed as parts.
>
> The testcase looks mips specific - it has just scan-tree-dump-not which
> is probably easily confused to PASS when it regressed.  Can you instead
> add a gcc.target/mips/ testcase that scans for actual assembler features?
> If the testcase relies on inlining daxpy then declaring that static helps that
> you just get dgefa in the final assembly.  If you want to scan both functions
> I suggest to split the testcase into two to make it more reliable.
>
> I see r15-5650-gd9c908b7503965 for a --target=mips64-r6-linux-gnu generates
> for the innermost loop of dgefa
>
> .L12:
>         addu    $3,$9,$2
>         addu    $3,$3,$8
>         lwc1    $f1,0($3)
>         lwc1    $f0,0($2)
>         addiu   $7,$7,1
>         mul.s   $f1,$f2,$f1
>         addiu   $2,$2,4
>         slt     $3,$7,$10
>         add.s   $f0,$f0,$f1
>         .set    noreorder
>         .set    nomacro
>         bne     $3,$0,.L12
>
> and with the patch
>
> .L12:
>         addu    $3,$9,$2
>         addu    $3,$3,$8
>         lwc1    $f1,0($3)
>         lwc1    $f0,0($2)
>         addiu   $7,$7,1
>         mul.s   $f1,$f2,$f1
>         addiu   $2,$2,4
>         slt     $3,$7,$10
>         add.s   $f0,$f0,$f1
>         .set    noreorder
>         .set    nomacro
>         bne     $3,$0,.L12
>
> that's suspiciously identical?!  In fact the whole testcase generates
> identical code.
>
> So besides not being able to see the actual problem (maybe I need some
> -march/-mtune?)
> the actual issue I have with the patch is that aff_inv is tried to be
> distributed to other
> components and for parts that fail to be distributed we cost it via
>
>   if (comp_inv != NULL_TREE)
>     cost = force_var_cost (data, comp_inv, inv_vars);
>
> simply ensuring the complexity is at least one would have been to change that 
> to
>
>   if (comp_inv != NULL_TREE)
>     {
>       cost = force_var_cost (data, comp_inv, inv_vars);
>       /* Ensure complexity is at least one.  */
>       cost.complexity = MAX (1, cost.complexity);
>     }
>
> or alternatively just do that for the if (comp_inv && inv_expr &&
> !simple_inv) case
> (it's a bit odd we adjust cost there only for 'inv_expr != NULL').
>
> The patch you posted instead of just adjusting complexity seems to change
> the way we distribute the invariant - in particular we now distribute it to
> parts.offset even when that is not supported (!(ok_with_ratio_p ||
> ok_without_ratio_p)),
> that's an odd change.
>
> complexity is supposed to be a tie-breaker, so I think that having
> bigger complexity
> for when we can't move it fully to index is OK - in the end any part
> that cannot be
> moved will end up being applied to base I think (that we have
> essentially two functions
> for this, one for costing and one for actual code emission, is a bit
> unfortunate).
>
> In the end I can't ack this patch as I cannot reproduce an effect on
> the testcase
> you added and because of the odd change part of the patch which is clearly not
> only doing what it's description says.
>
> Thanks,
> Richard.
>
>
> > Regards,
> > Aleksandar

Reply via email to