On Wed, Dec 3, 2025 at 4:19 PM Aleksandar Rakic
<[email protected]> wrote:
>
> Hi Richard,
>
> Thank you for your feedback on the patch. I’d like to better understand
> your point about this being considered an “unrelated change.”
>
> Could you please explain why distributing the invariant to
> `parts.offset` in the case of `!(ok_with_ratio_p || ok_without_ratio_p)`
> is seen as unrelated? From my understanding, this condition only checks
> support for `base + index [<< scale]`, not for `base + offset`. For
> example, in commit 45f3501bda (around line 4754 in
> tree-ssa-loop-ivopts.cc), the code still validates `base + offset` even
> when `base + index` is not supported.
>
> If possible, could you share an example or scenario where distributing
> to `parts.offset` would not be valid? This would help clarify the design
> intent and ensure the patch aligns with expectations.

I think it's on your side to show this change is required and positive,
not mine to prove it is not.

Sorry, but every time I look at this patch it takes me more than an hour
to familiarize myself again with this code, this feels like wasted time,
esp. since last time I did this I could not reproduce any change from
your patch on the testcase(s).

Richard.

>
> Thanks for your time and insights.
>
> Best regards,
> Aleksandar
>
>
> ________________________________________
> From: Richard Biener <[email protected]>
> Sent: Wednesday, January 8, 2025 3:30 PM
> To: Aleksandar Rakic
> Cc: [email protected]; Djordje Todorovic; Jovan Dmitrovic
> Subject: Re: [Bug tree-optimization/109429] [PATCH v2] ivopts: fixed 
> complexities
>
> [Some people who received this message don't often get email from 
> [email protected]. Learn why this is important at 
> https://aka.ms/LearnAboutSenderIdentification ]
>
> CAUTION: This email originated from outside of the organization. Do not click 
> links or open attachments unless you recognize the sender and know the 
> content is safe.
>
>
> On Wed, Dec 18, 2024 at 2:55 PM Aleksandar Rakic
> <[email protected]> wrote:
> >
> > Hi,
> >
> > > >
> > > > Hi,
> > > >
> > > > I think I managed to fix indentation from the previous version.
> > > >
> > > > When comparing the tables showing the candidates for the group 1
> > > > before and after applying this patch, it can be observed that
> > > > complexities for the candidates where the computation depends on the
> > > > invariant expressions or the invariant variables should be at least
> > > > one, which aligns with the approach used in the commit c2b64ce.
> > >
> > > Commit c2b64ce is
> > >
> > > +2017-05-11  Bin Cheng  <[email protected]>
> > > +
> > > +     * tree-ssa-address.c (struct mem_address): Move to header file.
> > > +     (valid_mem_ref_p, move_fixed_address_to_symbol): Make it global.
> > > +     * tree-ssa-address.h (struct mem_address): Move from C file.
> > > +     (valid_mem_ref_p, move_fixed_address_to_symbol): Declare.
> > >
> > > that hasn't any "approach" to complexity, it just makes a function
> > > global.
> >
> > In the commit c2b64ce, which is the parent of the commit f9f69dd, which
> > introduced the bug, in the function get_address_cost the complexity is
> > incremented by one if var_present != 0, so that's the reason why I've
> > mentioned that commit.
>
> OK, so that was confusing - you should have mentioned the commit
> _changing_ the behavior.  I see that change is very large though.
>
> > > >
> > > > ===== Before this patch =====
> > > > Group 1:
> > > >   cand  cost    compl.  inv.expr.       inv.vars
> > > >   1     11      0       5;      NIL;
> > > >   2     11      0       6;      NIL;
> > > >   4     8       0       7;      NIL;
> > > >   5     9       0       8;      NIL;
> > > >   6     1       0       NIL;    NIL;
> > > >   7     1       1       NIL;    NIL;
> > > >   9     7       0       5;      NIL;
> > > > ===== Before this patch =====
> > > > ===== After this patch =====
> > > > Group 1:
> > > >   cand  cost    compl.  inv.expr.       inv.vars
> > > >   1     11      2       4;      NIL;
> > >
> > > why does complexity go up to 2 from 0 here?
> >
> > The complexity for certain candidates equals 2 in the figure because two
> > conditions are met:
> > 1) parts.offset != NULL_TREE && !integer_zerop (parts.offset),
> > 2) (inv_vars && *inv_vars && !bitmap_empty_p (*inv_vars)) ||
> > (inv_expr && *inv_expr != NULL).
> >
> > > >   2     11      1       4;      NIL;
> > > >   4     8       1       5;      NIL;
> > > >   5     8       2       6;      NIL;
> > >
> > > Likewise, and why does cost change?
> > >
> > > >   6     1       0       NIL;    NIL;
> > > >   7     1       1       NIL;    NIL;
> > > >   9     7       2       4;      NIL;
> > >
> > > Likewise.
> >
> > The cost for a candidate in the figure changes because parts.offset and
> > aff_inv->offset change, which lead to different cost.
> >
> > > This comparison is probably not very useful without showing the actual
> > > candidate and its uses?
> >
> > The candidates and the use are attached. For example, the candidate 1
> > has a base 0 and a step 1. Use 1 has a base vector_27(D) +
> > ((sizetype) _11 + 1) * 4 and a step 4. Use can be presented as ubase +
> > ratio * (candidate - cbase), where ratio is ustep/cstep. It equals
> > vector_27(D) + ((sizetype) _11 + 1) * 4 + 4 * candidate1, ie. aff_inv
> > equals vector_27(D) + (sizetype) _11 * 4 + 4 and aff_var equals
> > 4 * candidate1. Then, parts.offset becomes 4 (from aff_inv) and the
> > invariant expression 4 equals vector_27(D) + (sizetype) _11 * 4.
> >
> > > The above before/after figures do not match the testcase
> > > ontop of trunk.
> > >
> > > > ===== After this patch =====
> > > >
> > > > Hence, if the invariant expressions or the invariant variables are
> > > > used when representing use with candidate, the complexity should be
> > > > larger for more complex expressions, so it is incremented by one. I
> > > > am not sure whether inv_present could be expressed as parts.
> > >
> > > The testcase looks mips specific - it has just scan-tree-dump-not
> > > which is probably easily confused to PASS when it regressed. Can you
> > > instead add a gcc.target/mips/ testcase that scans for actual
> > > assembler features? If the testcase relies on inlining daxpy then
> > > declaring that static helps that you just get dgefa in the final
> > > assembly. If you want to scan both functions I suggest to split the
> > > testcase into two to make it more reliable.
> >
> > I positioned on the commit a4954130d4 (c++: define __cpp_pack_indexing
> > [PR113798]), applied the patch, and got the same before/after figures.
> > Also, both scan-tree-dump-not tests passed. I declared daxpy static in
> > the new testcase. I am not sure what features to scan in assembly.
>
> A scan-tree-dump-not easily passes (as said, it passes before/after the
> patch for me).
>
> > > I see r15-5650-gd9c908b7503965 for a --target=mips64-r6-linux-gnu
> > > generates for the innermost loop of dgefa
> > >
> > > .L12:
> > >         addu    $3,$9,$2
> > >         addu    $3,$3,$8
> > >         lwc1    $f1,0($3)
> > >         lwc1    $f0,0($2)
> > >         addiu   $7,$7,1
> > >         mul.s   $f1,$f2,$f1
> > >         addiu   $2,$2,4
> > >         slt     $3,$7,$10
> > >         add.s   $f0,$f0,$f1
> > >         .set    noreorder
> > >         .set    nomacro
> > >         bne     $3,$0,.L12
> > >
> > > and with the patch
> > >
> > > .L12:
> > >         addu    $3,$9,$2
> > >         addu    $3,$3,$8
> > >         lwc1    $f1,0($3)
> > >         lwc1    $f0,0($2)
> > >         addiu   $7,$7,1
> > >         mul.s   $f1,$f2,$f1
> > >         addiu   $2,$2,4
> > >         slt     $3,$7,$10
> > >         add.s   $f0,$f0,$f1
> > >         .set    noreorder
> > >         .set    nomacro
> > >         bne     $3,$0,.L12
> > >
> > > that's suspiciously identical?! In fact the whole testcase generates
> > > identical code.
> >
> > Could you please elaborate what do you mean by r15-5650-gd9c908b7503965?
>
> That's the revision as gcc-descr encodes it, git hash d9c908b750 which is
> a few days earlier as your one above.
>
> > We managed to generate different assembly code with a patch. We used the
> > following command to achieve that:
> >
> > $PREFIX/bin/mips64-r6-linux-gnu-gcc test.c -O2 \
> > -fdump-tree-ivopts-details -S -o test.s,
> >
> > where $PREFIX is an install directory.
> >
> > Could you please share the command which you used to generate identical
> > code? As I mentioned, we positioned on the commit a4954130d4 and applied
> > the patch.
>
> Same as yours, I mentioned the GCC configuration which might be the 
> difference.
> Note I've not installed the mips cross but only built all-gcc and
> invoked gcc/cc1
> directly.
>
> > > So besides not being able to see the actual problem (maybe I need some
> > > -march/-mtune?) the actual issue I have with the patch is that aff_inv
> > > is tried to be distributed to other components and for parts that fail
> > > to be distributed we cost it via
> > >
> > >   if (comp_inv != NULL_TREE)
> > >     cost = force_var_cost (data, comp_inv, inv_vars);
> > >
> > > simply ensuring the complexity is at least one would have been to
> > > change that to
> > >
> > >   if (comp_inv != NULL_TREE)
> > >     {
> > >       cost = force_var_cost (data, comp_inv, inv_vars);
> > >       /* Ensure complexity is at least one.  */
> > >       cost.complexity = MAX (1, cost.complexity);
> > >     }
> > >
> > > or alternatively just do that for the if (comp_inv && inv_expr &&
> > > !simple_inv) case
> >
> > The patch you posted related to complexity seems to be OK, but I would
> > slightly modify it as follows:
> >
> >    if (comp_inv != NULL_TREE)
> >      {
> >        cost = force_var_cost (data, comp_inv, inv_vars);
> >        cost.complexity += 1;
> >      }
> >
> > I said 'the complexity should be at least one' because it was zero in
> > the before figure, but it is just one special case.
> >
> > > (it's a bit odd we adjust cost there only for 'inv_expr != NULL').
> >
> > Could you please elaborate this sentence? My condition for complexity
> > incrementing was:
> >
> > if ((inv_vars && *inv_vars && !bitmap_empty_p (*inv_vars)) || \
> > (inv_expr && *inv_expr != NULL)).
> >
> > > The patch you posted instead of just adjusting complexity seems to
> > > change the way we distribute the invariant - in particular we now
> > > distribute it to parts.offset even when that is not supported
> > > (!(ok_with_ratio_p || ok_without_ratio_p)), that's an odd change.
> >
> > Could you please elaborate this because I think I don't understand you.
> > The condition !(ok_with_ratio_p || ok_without_ratio_p) checks whether
> > base + index [<< scale] is supported or not. It doesn't check whether
> > base + offset is supported. For example, the code on the commit
> > a4954130d4 at line 4767 of tree-ssa-loop-ivopts.cc checks addressing
> > mode base + offset even if base + index is not checked. I think that
> > base + index is not required condition for base + offset.
>
> It looks like an unrelated change to me.  If you are happy with
>
> >    if (comp_inv != NULL_TREE)
> >      {
> >        cost = force_var_cost (data, comp_inv, inv_vars);
> >        cost.complexity += 1;
> >      }
>
> can you please give that full testing and re-post such patch?  I'd also
> appreciate if you'd move the testcase to be in gcc.target/mips, scanning
> for expected assembly, rather than scanning for unexpected things not
> appearing.
>
> Please also adjust the commit message to indicate the revision
> introducing the regression rather than using the one right before.
>
> Richard,
>
> > Kind regards,
> > Aleksandar
> >
> > ________________________________________
> > From: Richard Biener <[email protected]>
> > Sent: Thursday, November 28, 2024 1:45 PM
> > To: Aleksandar Rakic
> > Cc: [email protected]; Djordje Todorovic; Jovan Dmitrovic
> > Subject: Re: [Bug tree-optimization/109429] [PATCH v2] ivopts: fixed 
> > complexities
> >
> >
> >
> > On Wed, Sep 25, 2024 at 5:32 PM Aleksandar Rakic
> > <[email protected]> wrote:
> > >
> > > Hi,
> > >
> > > I think I managed to fix indentation from the previous version.
> > >
> > > When comparing the tables showing the candidates for the group 1 before
> > > and after applying this patch, it can be observed that complexities for
> > > the candidates where the computation depends on the invariant
> > > expressions or the invariant variables should be at least one, which
> > > aligns with the approach used in the commit c2b64ce.
> >
> > Commit c2b64ce is
> >
> > +2017-05-11  Bin Cheng  <[email protected]>
> > +
> > +       * tree-ssa-address.c (struct mem_address): Move to header file.
> > +       (valid_mem_ref_p, move_fixed_address_to_symbol): Make it global.
> > +       * tree-ssa-address.h (struct mem_address): Move from C file.
> > +       (valid_mem_ref_p, move_fixed_address_to_symbol): Declare.
> >
> > that hasn't any "approach" to complexity, it just makes a function global.
> >
> > >
> > > ===== Before this patch =====
> > > Group 1:
> > >   cand  cost    compl.  inv.expr.       inv.vars
> > >   1     11      0       5;      NIL;
> > >   2     11      0       6;      NIL;
> > >   4     8       0       7;      NIL;
> > >   5     9       0       8;      NIL;
> > >   6     1       0       NIL;    NIL;
> > >   7     1       1       NIL;    NIL;
> > >   9     7       0       5;      NIL;
> > > ===== Before this patch =====
> > > ===== After this patch =====
> > > Group 1:
> > >   cand  cost    compl.  inv.expr.       inv.vars
> > >   1     11      2       4;      NIL;
> >
> > why does complexity go up to 2 from 0 here?
> >
> > >   2     11      1       4;      NIL;
> > >   4     8       1       5;      NIL;
> > >   5     8       2       6;      NIL;
> >
> > Likewise, and why does cost change?
> >
> > >   6     1       0       NIL;    NIL;
> > >   7     1       1       NIL;    NIL;
> > >   9     7       2       4;      NIL;
> >
> > Likewise.
> >
> > This comparison is probably not very useful without showing the actual 
> > candidate
> > and its uses?  The above before/after figures do not match the testcase
> > ontop of trunk.
> >
> > > ===== After this patch =====
> > >
> > > Hence, if the invariant expressions or the invariant variables are used
> > > when representing use with candidate, the complexity should be larger
> > > for more complex expressions, so it is incremented by one. I am not sure
> > > whether inv_present could be expressed as parts.
> >
> > The testcase looks mips specific - it has just scan-tree-dump-not which
> > is probably easily confused to PASS when it regressed.  Can you instead
> > add a gcc.target/mips/ testcase that scans for actual assembler features?
> > If the testcase relies on inlining daxpy then declaring that static helps 
> > that
> > you just get dgefa in the final assembly.  If you want to scan both 
> > functions
> > I suggest to split the testcase into two to make it more reliable.
> >
> > I see r15-5650-gd9c908b7503965 for a --target=mips64-r6-linux-gnu generates
> > for the innermost loop of dgefa
> >
> > .L12:
> >         addu    $3,$9,$2
> >         addu    $3,$3,$8
> >         lwc1    $f1,0($3)
> >         lwc1    $f0,0($2)
> >         addiu   $7,$7,1
> >         mul.s   $f1,$f2,$f1
> >         addiu   $2,$2,4
> >         slt     $3,$7,$10
> >         add.s   $f0,$f0,$f1
> >         .set    noreorder
> >         .set    nomacro
> >         bne     $3,$0,.L12
> >
> > and with the patch
> >
> > .L12:
> >         addu    $3,$9,$2
> >         addu    $3,$3,$8
> >         lwc1    $f1,0($3)
> >         lwc1    $f0,0($2)
> >         addiu   $7,$7,1
> >         mul.s   $f1,$f2,$f1
> >         addiu   $2,$2,4
> >         slt     $3,$7,$10
> >         add.s   $f0,$f0,$f1
> >         .set    noreorder
> >         .set    nomacro
> >         bne     $3,$0,.L12
> >
> > that's suspiciously identical?!  In fact the whole testcase generates
> > identical code.
> >
> > So besides not being able to see the actual problem (maybe I need some
> > -march/-mtune?)
> > the actual issue I have with the patch is that aff_inv is tried to be
> > distributed to other
> > components and for parts that fail to be distributed we cost it via
> >
> >   if (comp_inv != NULL_TREE)
> >     cost = force_var_cost (data, comp_inv, inv_vars);
> >
> > simply ensuring the complexity is at least one would have been to change 
> > that to
> >
> >   if (comp_inv != NULL_TREE)
> >     {
> >       cost = force_var_cost (data, comp_inv, inv_vars);
> >       /* Ensure complexity is at least one.  */
> >       cost.complexity = MAX (1, cost.complexity);
> >     }
> >
> > or alternatively just do that for the if (comp_inv && inv_expr &&
> > !simple_inv) case
> > (it's a bit odd we adjust cost there only for 'inv_expr != NULL').
> >
> > The patch you posted instead of just adjusting complexity seems to change
> > the way we distribute the invariant - in particular we now distribute it to
> > parts.offset even when that is not supported (!(ok_with_ratio_p ||
> > ok_without_ratio_p)),
> > that's an odd change.
> >
> > complexity is supposed to be a tie-breaker, so I think that having
> > bigger complexity
> > for when we can't move it fully to index is OK - in the end any part
> > that cannot be
> > moved will end up being applied to base I think (that we have
> > essentially two functions
> > for this, one for costing and one for actual code emission, is a bit
> > unfortunate).
> >
> > In the end I can't ack this patch as I cannot reproduce an effect on
> > the testcase
> > you added and because of the odd change part of the patch which is clearly 
> > not
> > only doing what it's description says.
> >
> > Thanks,
> > Richard.
> >
> >
> > > Regards,
> > > Aleksandar

Reply via email to