Hi,

> >
> > Hi,
> >
> > I think I managed to fix indentation from the previous version.
> >
> > When comparing the tables showing the candidates for the group 1
> > before and after applying this patch, it can be observed that
> > complexities for the candidates where the computation depends on the
> > invariant expressions or the invariant variables should be at least
> > one, which aligns with the approach used in the commit c2b64ce.
>
> Commit c2b64ce is
>
> +2017-05-11  Bin Cheng  <bin.ch...@arm.com>
> +
> +     * tree-ssa-address.c (struct mem_address): Move to header file.
> +     (valid_mem_ref_p, move_fixed_address_to_symbol): Make it global.
> +     * tree-ssa-address.h (struct mem_address): Move from C file.
> +     (valid_mem_ref_p, move_fixed_address_to_symbol): Declare.
>
> that hasn't any "approach" to complexity, it just makes a function
> global.

In the commit c2b64ce, which is the parent of the commit f9f69dd, which
introduced the bug, in the function get_address_cost the complexity is
incremented by one if var_present != 0, so that's the reason why I've
mentioned that commit.

> >
> > ===== Before this patch =====
> > Group 1:
> >   cand  cost    compl.  inv.expr.       inv.vars
> >   1     11      0       5;      NIL;
> >   2     11      0       6;      NIL;
> >   4     8       0       7;      NIL;
> >   5     9       0       8;      NIL;
> >   6     1       0       NIL;    NIL;
> >   7     1       1       NIL;    NIL;
> >   9     7       0       5;      NIL;
> > ===== Before this patch =====
> > ===== After this patch =====
> > Group 1:
> >   cand  cost    compl.  inv.expr.       inv.vars
> >   1     11      2       4;      NIL;
>
> why does complexity go up to 2 from 0 here?

The complexity for certain candidates equals 2 in the figure because two
conditions are met:
1) parts.offset != NULL_TREE && !integer_zerop (parts.offset),
2) (inv_vars && *inv_vars && !bitmap_empty_p (*inv_vars)) ||
(inv_expr && *inv_expr != NULL).

> >   2     11      1       4;      NIL;
> >   4     8       1       5;      NIL;
> >   5     8       2       6;      NIL;
>
> Likewise, and why does cost change?
>
> >   6     1       0       NIL;    NIL;
> >   7     1       1       NIL;    NIL;
> >   9     7       2       4;      NIL;
>
> Likewise.

The cost for a candidate in the figure changes because parts.offset and
aff_inv->offset change, which lead to different cost.

> This comparison is probably not very useful without showing the actual
> candidate and its uses?

The candidates and the use are attached. For example, the candidate 1
has a base 0 and a step 1. Use 1 has a base vector_27(D) +
((sizetype) _11 + 1) * 4 and a step 4. Use can be presented as ubase +
ratio * (candidate - cbase), where ratio is ustep/cstep. It equals
vector_27(D) + ((sizetype) _11 + 1) * 4 + 4 * candidate1, ie. aff_inv
equals vector_27(D) + (sizetype) _11 * 4 + 4 and aff_var equals
4 * candidate1. Then, parts.offset becomes 4 (from aff_inv) and the
invariant expression 4 equals vector_27(D) + (sizetype) _11 * 4.

> The above before/after figures do not match the testcase
> ontop of trunk.
>
> > ===== After this patch =====
> >
> > Hence, if the invariant expressions or the invariant variables are
> > used when representing use with candidate, the complexity should be
> > larger for more complex expressions, so it is incremented by one. I
> > am not sure whether inv_present could be expressed as parts.
>
> The testcase looks mips specific - it has just scan-tree-dump-not
> which is probably easily confused to PASS when it regressed. Can you
> instead add a gcc.target/mips/ testcase that scans for actual
> assembler features? If the testcase relies on inlining daxpy then
> declaring that static helps that you just get dgefa in the final
> assembly. If you want to scan both functions I suggest to split the
> testcase into two to make it more reliable.

I positioned on the commit a4954130d4 (c++: define __cpp_pack_indexing
[PR113798]), applied the patch, and got the same before/after figures.
Also, both scan-tree-dump-not tests passed. I declared daxpy static in
the new testcase. I am not sure what features to scan in assembly.

> I see r15-5650-gd9c908b7503965 for a --target=mips64-r6-linux-gnu
> generates for the innermost loop of dgefa
>
> .L12:
>         addu    $3,$9,$2
>         addu    $3,$3,$8
>         lwc1    $f1,0($3)
>         lwc1    $f0,0($2)
>         addiu   $7,$7,1
>         mul.s   $f1,$f2,$f1
>         addiu   $2,$2,4
>         slt     $3,$7,$10
>         add.s   $f0,$f0,$f1
>         .set    noreorder
>         .set    nomacro
>         bne     $3,$0,.L12
>
> and with the patch
>
> .L12:
>         addu    $3,$9,$2
>         addu    $3,$3,$8
>         lwc1    $f1,0($3)
>         lwc1    $f0,0($2)
>         addiu   $7,$7,1
>         mul.s   $f1,$f2,$f1
>         addiu   $2,$2,4
>         slt     $3,$7,$10
>         add.s   $f0,$f0,$f1
>         .set    noreorder
>         .set    nomacro
>         bne     $3,$0,.L12
>
> that's suspiciously identical?! In fact the whole testcase generates
> identical code.

Could you please elaborate what do you mean by r15-5650-gd9c908b7503965?
We managed to generate different assembly code with a patch. We used the
following command to achieve that:

$PREFIX/bin/mips64-r6-linux-gnu-gcc test.c -O2 \
-fdump-tree-ivopts-details -S -o test.s,

where $PREFIX is an install directory.

Could you please share the command which you used to generate identical
code? As I mentioned, we positioned on the commit a4954130d4 and applied
the patch.

> So besides not being able to see the actual problem (maybe I need some
> -march/-mtune?) the actual issue I have with the patch is that aff_inv
> is tried to be distributed to other components and for parts that fail
> to be distributed we cost it via
>
>   if (comp_inv != NULL_TREE)
>     cost = force_var_cost (data, comp_inv, inv_vars);
>
> simply ensuring the complexity is at least one would have been to
> change that to
>
>   if (comp_inv != NULL_TREE)
>     {
>       cost = force_var_cost (data, comp_inv, inv_vars);
>       /* Ensure complexity is at least one.  */
>       cost.complexity = MAX (1, cost.complexity);
>     }
>
> or alternatively just do that for the if (comp_inv && inv_expr &&
> !simple_inv) case

The patch you posted related to complexity seems to be OK, but I would
slightly modify it as follows:

   if (comp_inv != NULL_TREE)
     {
       cost = force_var_cost (data, comp_inv, inv_vars);
       cost.complexity += 1;
     }

I said 'the complexity should be at least one' because it was zero in
the before figure, but it is just one special case.

> (it's a bit odd we adjust cost there only for 'inv_expr != NULL').

Could you please elaborate this sentence? My condition for complexity
incrementing was:

if ((inv_vars && *inv_vars && !bitmap_empty_p (*inv_vars)) || \
(inv_expr && *inv_expr != NULL)).

> The patch you posted instead of just adjusting complexity seems to
> change the way we distribute the invariant - in particular we now
> distribute it to parts.offset even when that is not supported
> (!(ok_with_ratio_p || ok_without_ratio_p)), that's an odd change.

Could you please elaborate this because I think I don't understand you.
The condition !(ok_with_ratio_p || ok_without_ratio_p) checks whether
base + index [<< scale] is supported or not. It doesn't check whether
base + offset is supported. For example, the code on the commit
a4954130d4 at line 4767 of tree-ssa-loop-ivopts.cc checks addressing
mode base + offset even if base + index is not checked. I think that
base + index is not required condition for base + offset.

Kind regards,
Aleksandar

________________________________________
From: Richard Biener <richard.guent...@gmail.com>
Sent: Thursday, November 28, 2024 1:45 PM
To: Aleksandar Rakic
Cc: gcc-patches@gcc.gnu.org; Djordje Todorovic; Jovan Dmitrovic
Subject: Re: [Bug tree-optimization/109429] [PATCH v2] ivopts: fixed 
complexities



On Wed, Sep 25, 2024 at 5:32 PM Aleksandar Rakic
<aleksandar.ra...@htecgroup.com> wrote:
>
> Hi,
>
> I think I managed to fix indentation from the previous version.
>
> When comparing the tables showing the candidates for the group 1 before
> and after applying this patch, it can be observed that complexities for
> the candidates where the computation depends on the invariant
> expressions or the invariant variables should be at least one, which
> aligns with the approach used in the commit c2b64ce.

Commit c2b64ce is

+2017-05-11  Bin Cheng  <bin.ch...@arm.com>
+
+       * tree-ssa-address.c (struct mem_address): Move to header file.
+       (valid_mem_ref_p, move_fixed_address_to_symbol): Make it global.
+       * tree-ssa-address.h (struct mem_address): Move from C file.
+       (valid_mem_ref_p, move_fixed_address_to_symbol): Declare.

that hasn't any "approach" to complexity, it just makes a function global.

>
> ===== Before this patch =====
> Group 1:
>   cand  cost    compl.  inv.expr.       inv.vars
>   1     11      0       5;      NIL;
>   2     11      0       6;      NIL;
>   4     8       0       7;      NIL;
>   5     9       0       8;      NIL;
>   6     1       0       NIL;    NIL;
>   7     1       1       NIL;    NIL;
>   9     7       0       5;      NIL;
> ===== Before this patch =====
> ===== After this patch =====
> Group 1:
>   cand  cost    compl.  inv.expr.       inv.vars
>   1     11      2       4;      NIL;

why does complexity go up to 2 from 0 here?

>   2     11      1       4;      NIL;
>   4     8       1       5;      NIL;
>   5     8       2       6;      NIL;

Likewise, and why does cost change?

>   6     1       0       NIL;    NIL;
>   7     1       1       NIL;    NIL;
>   9     7       2       4;      NIL;

Likewise.

This comparison is probably not very useful without showing the actual candidate
and its uses?  The above before/after figures do not match the testcase
ontop of trunk.

> ===== After this patch =====
>
> Hence, if the invariant expressions or the invariant variables are used
> when representing use with candidate, the complexity should be larger
> for more complex expressions, so it is incremented by one. I am not sure
> whether inv_present could be expressed as parts.

The testcase looks mips specific - it has just scan-tree-dump-not which
is probably easily confused to PASS when it regressed.  Can you instead
add a gcc.target/mips/ testcase that scans for actual assembler features?
If the testcase relies on inlining daxpy then declaring that static helps that
you just get dgefa in the final assembly.  If you want to scan both functions
I suggest to split the testcase into two to make it more reliable.

I see r15-5650-gd9c908b7503965 for a --target=mips64-r6-linux-gnu generates
for the innermost loop of dgefa

.L12:
        addu    $3,$9,$2
        addu    $3,$3,$8
        lwc1    $f1,0($3)
        lwc1    $f0,0($2)
        addiu   $7,$7,1
        mul.s   $f1,$f2,$f1
        addiu   $2,$2,4
        slt     $3,$7,$10
        add.s   $f0,$f0,$f1
        .set    noreorder
        .set    nomacro
        bne     $3,$0,.L12

and with the patch

.L12:
        addu    $3,$9,$2
        addu    $3,$3,$8
        lwc1    $f1,0($3)
        lwc1    $f0,0($2)
        addiu   $7,$7,1
        mul.s   $f1,$f2,$f1
        addiu   $2,$2,4
        slt     $3,$7,$10
        add.s   $f0,$f0,$f1
        .set    noreorder
        .set    nomacro
        bne     $3,$0,.L12

that's suspiciously identical?!  In fact the whole testcase generates
identical code.

So besides not being able to see the actual problem (maybe I need some
-march/-mtune?)
the actual issue I have with the patch is that aff_inv is tried to be
distributed to other
components and for parts that fail to be distributed we cost it via

  if (comp_inv != NULL_TREE)
    cost = force_var_cost (data, comp_inv, inv_vars);

simply ensuring the complexity is at least one would have been to change that to

  if (comp_inv != NULL_TREE)
    {
      cost = force_var_cost (data, comp_inv, inv_vars);
      /* Ensure complexity is at least one.  */
      cost.complexity = MAX (1, cost.complexity);
    }

or alternatively just do that for the if (comp_inv && inv_expr &&
!simple_inv) case
(it's a bit odd we adjust cost there only for 'inv_expr != NULL').

The patch you posted instead of just adjusting complexity seems to change
the way we distribute the invariant - in particular we now distribute it to
parts.offset even when that is not supported (!(ok_with_ratio_p ||
ok_without_ratio_p)),
that's an odd change.

complexity is supposed to be a tie-breaker, so I think that having
bigger complexity
for when we can't move it fully to index is OK - in the end any part
that cannot be
moved will end up being applied to base I think (that we have
essentially two functions
for this, one for costing and one for actual code emission, is a bit
unfortunate).

In the end I can't ack this patch as I cannot reproduce an effect on
the testcase
you added and because of the odd change part of the patch which is clearly not
only doing what it's description says.

Thanks,
Richard.


> Regards,
> Aleksandar
/* { dg-do compile { target mips64-r6-linux-gnu } } */
/* { dg-options "-O2 -fdump-tree-ivopts-details" } */
/* This test ensures that complexity must be greater than zero if there is
   an invariant variable or an invariant expression in the address
   expression.  */

static void daxpy (float *vector1, float *vector2, int n, float fp_const)
{
       for (int i = 0; i < n; ++i)
               vector1[i] += fp_const * vector2[i];
}

void dgefa (float *vector, int m, int n, int l)
{
       for (int i = 0; i < n - 1; ++i)
       {
               for (int j = i + 1; j < n; ++j)
               {
                       float t = vector[m * j + l];
                       daxpy (&vector[m * i + i + 1],
                               &vector[m * j + i + 1], n - (i + 1), t);
               }
       }
}


/* { dg-final { scan-tree-dump-not "Processing loop 3(.*\n)*<IV Groups>:(.*\n)*Group 1:\n  Type:.*ADDRESS(.*\n)*Group 1:\n  cand\tcost\tcompl\.\tinv\.expr\.\tinv\.vars\n(.*\n)*(.+\t.+\t0\t\\d+(, \\d+)*;\t.+\n)(.*\n)*Group 2:\n  cand\tcost\tcompl\.\tinv\.expr\.\tinv\.vars\n(.*\n)*Selected IV set for loop 3" "ivopts" { target { mips64-r6-linux-gnu } } } } */


/* { dg-final { scan-tree-dump-not "Processing loop 3(.*\n)*<IV Groups>:(.*\n)*Group 1:\n  Type:.*ADDRESS(.*\n)*Group 1:\n  cand\tcost\tcompl\.\tinv\.expr\.\tinv\.vars\n(.*\n)*(.+\t.+\t0\t.+\t\\d+(, \\d+)*\n)(.*\n)*Group 2:\n  cand\tcost\tcompl\.\tinv\.expr\.\tinv\.vars\n(.*\n)*Selected IV set for loop 3" "ivopts" { target { mips64-r6-linux-gnu } } } } */


Attachment: test_ivopts_bug_after_patch.s
Description: test_ivopts_bug_after_patch.s

Attachment: test_ivopts_bug_before_patch.s
Description: test_ivopts_bug_before_patch.s

Attachment: tree-ivopts-details-before.c.188t.ivopts
Description: tree-ivopts-details-before.c.188t.ivopts

Attachment: tree-ivopts-details-after.c.188t.ivopts
Description: tree-ivopts-details-after.c.188t.ivopts

Reply via email to