Hi, > > > > Hi, > > > > I think I managed to fix indentation from the previous version. > > > > When comparing the tables showing the candidates for the group 1 > > before and after applying this patch, it can be observed that > > complexities for the candidates where the computation depends on the > > invariant expressions or the invariant variables should be at least > > one, which aligns with the approach used in the commit c2b64ce. > > Commit c2b64ce is > > +2017-05-11 Bin Cheng <bin.ch...@arm.com> > + > + * tree-ssa-address.c (struct mem_address): Move to header file. > + (valid_mem_ref_p, move_fixed_address_to_symbol): Make it global. > + * tree-ssa-address.h (struct mem_address): Move from C file. > + (valid_mem_ref_p, move_fixed_address_to_symbol): Declare. > > that hasn't any "approach" to complexity, it just makes a function > global.
In the commit c2b64ce, which is the parent of the commit f9f69dd, which introduced the bug, in the function get_address_cost the complexity is incremented by one if var_present != 0, so that's the reason why I've mentioned that commit. > > > > ===== Before this patch ===== > > Group 1: > > cand cost compl. inv.expr. inv.vars > > 1 11 0 5; NIL; > > 2 11 0 6; NIL; > > 4 8 0 7; NIL; > > 5 9 0 8; NIL; > > 6 1 0 NIL; NIL; > > 7 1 1 NIL; NIL; > > 9 7 0 5; NIL; > > ===== Before this patch ===== > > ===== After this patch ===== > > Group 1: > > cand cost compl. inv.expr. inv.vars > > 1 11 2 4; NIL; > > why does complexity go up to 2 from 0 here? The complexity for certain candidates equals 2 in the figure because two conditions are met: 1) parts.offset != NULL_TREE && !integer_zerop (parts.offset), 2) (inv_vars && *inv_vars && !bitmap_empty_p (*inv_vars)) || (inv_expr && *inv_expr != NULL). > > 2 11 1 4; NIL; > > 4 8 1 5; NIL; > > 5 8 2 6; NIL; > > Likewise, and why does cost change? > > > 6 1 0 NIL; NIL; > > 7 1 1 NIL; NIL; > > 9 7 2 4; NIL; > > Likewise. The cost for a candidate in the figure changes because parts.offset and aff_inv->offset change, which lead to different cost. > This comparison is probably not very useful without showing the actual > candidate and its uses? The candidates and the use are attached. For example, the candidate 1 has a base 0 and a step 1. Use 1 has a base vector_27(D) + ((sizetype) _11 + 1) * 4 and a step 4. Use can be presented as ubase + ratio * (candidate - cbase), where ratio is ustep/cstep. It equals vector_27(D) + ((sizetype) _11 + 1) * 4 + 4 * candidate1, ie. aff_inv equals vector_27(D) + (sizetype) _11 * 4 + 4 and aff_var equals 4 * candidate1. Then, parts.offset becomes 4 (from aff_inv) and the invariant expression 4 equals vector_27(D) + (sizetype) _11 * 4. > The above before/after figures do not match the testcase > ontop of trunk. > > > ===== After this patch ===== > > > > Hence, if the invariant expressions or the invariant variables are > > used when representing use with candidate, the complexity should be > > larger for more complex expressions, so it is incremented by one. I > > am not sure whether inv_present could be expressed as parts. > > The testcase looks mips specific - it has just scan-tree-dump-not > which is probably easily confused to PASS when it regressed. Can you > instead add a gcc.target/mips/ testcase that scans for actual > assembler features? If the testcase relies on inlining daxpy then > declaring that static helps that you just get dgefa in the final > assembly. If you want to scan both functions I suggest to split the > testcase into two to make it more reliable. I positioned on the commit a4954130d4 (c++: define __cpp_pack_indexing [PR113798]), applied the patch, and got the same before/after figures. Also, both scan-tree-dump-not tests passed. I declared daxpy static in the new testcase. I am not sure what features to scan in assembly. > I see r15-5650-gd9c908b7503965 for a --target=mips64-r6-linux-gnu > generates for the innermost loop of dgefa > > .L12: > addu $3,$9,$2 > addu $3,$3,$8 > lwc1 $f1,0($3) > lwc1 $f0,0($2) > addiu $7,$7,1 > mul.s $f1,$f2,$f1 > addiu $2,$2,4 > slt $3,$7,$10 > add.s $f0,$f0,$f1 > .set noreorder > .set nomacro > bne $3,$0,.L12 > > and with the patch > > .L12: > addu $3,$9,$2 > addu $3,$3,$8 > lwc1 $f1,0($3) > lwc1 $f0,0($2) > addiu $7,$7,1 > mul.s $f1,$f2,$f1 > addiu $2,$2,4 > slt $3,$7,$10 > add.s $f0,$f0,$f1 > .set noreorder > .set nomacro > bne $3,$0,.L12 > > that's suspiciously identical?! In fact the whole testcase generates > identical code. Could you please elaborate what do you mean by r15-5650-gd9c908b7503965? We managed to generate different assembly code with a patch. We used the following command to achieve that: $PREFIX/bin/mips64-r6-linux-gnu-gcc test.c -O2 \ -fdump-tree-ivopts-details -S -o test.s, where $PREFIX is an install directory. Could you please share the command which you used to generate identical code? As I mentioned, we positioned on the commit a4954130d4 and applied the patch. > So besides not being able to see the actual problem (maybe I need some > -march/-mtune?) the actual issue I have with the patch is that aff_inv > is tried to be distributed to other components and for parts that fail > to be distributed we cost it via > > if (comp_inv != NULL_TREE) > cost = force_var_cost (data, comp_inv, inv_vars); > > simply ensuring the complexity is at least one would have been to > change that to > > if (comp_inv != NULL_TREE) > { > cost = force_var_cost (data, comp_inv, inv_vars); > /* Ensure complexity is at least one. */ > cost.complexity = MAX (1, cost.complexity); > } > > or alternatively just do that for the if (comp_inv && inv_expr && > !simple_inv) case The patch you posted related to complexity seems to be OK, but I would slightly modify it as follows: if (comp_inv != NULL_TREE) { cost = force_var_cost (data, comp_inv, inv_vars); cost.complexity += 1; } I said 'the complexity should be at least one' because it was zero in the before figure, but it is just one special case. > (it's a bit odd we adjust cost there only for 'inv_expr != NULL'). Could you please elaborate this sentence? My condition for complexity incrementing was: if ((inv_vars && *inv_vars && !bitmap_empty_p (*inv_vars)) || \ (inv_expr && *inv_expr != NULL)). > The patch you posted instead of just adjusting complexity seems to > change the way we distribute the invariant - in particular we now > distribute it to parts.offset even when that is not supported > (!(ok_with_ratio_p || ok_without_ratio_p)), that's an odd change. Could you please elaborate this because I think I don't understand you. The condition !(ok_with_ratio_p || ok_without_ratio_p) checks whether base + index [<< scale] is supported or not. It doesn't check whether base + offset is supported. For example, the code on the commit a4954130d4 at line 4767 of tree-ssa-loop-ivopts.cc checks addressing mode base + offset even if base + index is not checked. I think that base + index is not required condition for base + offset. Kind regards, Aleksandar ________________________________________ From: Richard Biener <richard.guent...@gmail.com> Sent: Thursday, November 28, 2024 1:45 PM To: Aleksandar Rakic Cc: gcc-patches@gcc.gnu.org; Djordje Todorovic; Jovan Dmitrovic Subject: Re: [Bug tree-optimization/109429] [PATCH v2] ivopts: fixed complexities On Wed, Sep 25, 2024 at 5:32 PM Aleksandar Rakic <aleksandar.ra...@htecgroup.com> wrote: > > Hi, > > I think I managed to fix indentation from the previous version. > > When comparing the tables showing the candidates for the group 1 before > and after applying this patch, it can be observed that complexities for > the candidates where the computation depends on the invariant > expressions or the invariant variables should be at least one, which > aligns with the approach used in the commit c2b64ce. Commit c2b64ce is +2017-05-11 Bin Cheng <bin.ch...@arm.com> + + * tree-ssa-address.c (struct mem_address): Move to header file. + (valid_mem_ref_p, move_fixed_address_to_symbol): Make it global. + * tree-ssa-address.h (struct mem_address): Move from C file. + (valid_mem_ref_p, move_fixed_address_to_symbol): Declare. that hasn't any "approach" to complexity, it just makes a function global. > > ===== Before this patch ===== > Group 1: > cand cost compl. inv.expr. inv.vars > 1 11 0 5; NIL; > 2 11 0 6; NIL; > 4 8 0 7; NIL; > 5 9 0 8; NIL; > 6 1 0 NIL; NIL; > 7 1 1 NIL; NIL; > 9 7 0 5; NIL; > ===== Before this patch ===== > ===== After this patch ===== > Group 1: > cand cost compl. inv.expr. inv.vars > 1 11 2 4; NIL; why does complexity go up to 2 from 0 here? > 2 11 1 4; NIL; > 4 8 1 5; NIL; > 5 8 2 6; NIL; Likewise, and why does cost change? > 6 1 0 NIL; NIL; > 7 1 1 NIL; NIL; > 9 7 2 4; NIL; Likewise. This comparison is probably not very useful without showing the actual candidate and its uses? The above before/after figures do not match the testcase ontop of trunk. > ===== After this patch ===== > > Hence, if the invariant expressions or the invariant variables are used > when representing use with candidate, the complexity should be larger > for more complex expressions, so it is incremented by one. I am not sure > whether inv_present could be expressed as parts. The testcase looks mips specific - it has just scan-tree-dump-not which is probably easily confused to PASS when it regressed. Can you instead add a gcc.target/mips/ testcase that scans for actual assembler features? If the testcase relies on inlining daxpy then declaring that static helps that you just get dgefa in the final assembly. If you want to scan both functions I suggest to split the testcase into two to make it more reliable. I see r15-5650-gd9c908b7503965 for a --target=mips64-r6-linux-gnu generates for the innermost loop of dgefa .L12: addu $3,$9,$2 addu $3,$3,$8 lwc1 $f1,0($3) lwc1 $f0,0($2) addiu $7,$7,1 mul.s $f1,$f2,$f1 addiu $2,$2,4 slt $3,$7,$10 add.s $f0,$f0,$f1 .set noreorder .set nomacro bne $3,$0,.L12 and with the patch .L12: addu $3,$9,$2 addu $3,$3,$8 lwc1 $f1,0($3) lwc1 $f0,0($2) addiu $7,$7,1 mul.s $f1,$f2,$f1 addiu $2,$2,4 slt $3,$7,$10 add.s $f0,$f0,$f1 .set noreorder .set nomacro bne $3,$0,.L12 that's suspiciously identical?! In fact the whole testcase generates identical code. So besides not being able to see the actual problem (maybe I need some -march/-mtune?) the actual issue I have with the patch is that aff_inv is tried to be distributed to other components and for parts that fail to be distributed we cost it via if (comp_inv != NULL_TREE) cost = force_var_cost (data, comp_inv, inv_vars); simply ensuring the complexity is at least one would have been to change that to if (comp_inv != NULL_TREE) { cost = force_var_cost (data, comp_inv, inv_vars); /* Ensure complexity is at least one. */ cost.complexity = MAX (1, cost.complexity); } or alternatively just do that for the if (comp_inv && inv_expr && !simple_inv) case (it's a bit odd we adjust cost there only for 'inv_expr != NULL'). The patch you posted instead of just adjusting complexity seems to change the way we distribute the invariant - in particular we now distribute it to parts.offset even when that is not supported (!(ok_with_ratio_p || ok_without_ratio_p)), that's an odd change. complexity is supposed to be a tie-breaker, so I think that having bigger complexity for when we can't move it fully to index is OK - in the end any part that cannot be moved will end up being applied to base I think (that we have essentially two functions for this, one for costing and one for actual code emission, is a bit unfortunate). In the end I can't ack this patch as I cannot reproduce an effect on the testcase you added and because of the odd change part of the patch which is clearly not only doing what it's description says. Thanks, Richard. > Regards, > Aleksandar
/* { dg-do compile { target mips64-r6-linux-gnu } } */ /* { dg-options "-O2 -fdump-tree-ivopts-details" } */ /* This test ensures that complexity must be greater than zero if there is an invariant variable or an invariant expression in the address expression. */ static void daxpy (float *vector1, float *vector2, int n, float fp_const) { for (int i = 0; i < n; ++i) vector1[i] += fp_const * vector2[i]; } void dgefa (float *vector, int m, int n, int l) { for (int i = 0; i < n - 1; ++i) { for (int j = i + 1; j < n; ++j) { float t = vector[m * j + l]; daxpy (&vector[m * i + i + 1], &vector[m * j + i + 1], n - (i + 1), t); } } } /* { dg-final { scan-tree-dump-not "Processing loop 3(.*\n)*<IV Groups>:(.*\n)*Group 1:\n Type:.*ADDRESS(.*\n)*Group 1:\n cand\tcost\tcompl\.\tinv\.expr\.\tinv\.vars\n(.*\n)*(.+\t.+\t0\t\\d+(, \\d+)*;\t.+\n)(.*\n)*Group 2:\n cand\tcost\tcompl\.\tinv\.expr\.\tinv\.vars\n(.*\n)*Selected IV set for loop 3" "ivopts" { target { mips64-r6-linux-gnu } } } } */ /* { dg-final { scan-tree-dump-not "Processing loop 3(.*\n)*<IV Groups>:(.*\n)*Group 1:\n Type:.*ADDRESS(.*\n)*Group 1:\n cand\tcost\tcompl\.\tinv\.expr\.\tinv\.vars\n(.*\n)*(.+\t.+\t0\t.+\t\\d+(, \\d+)*\n)(.*\n)*Group 2:\n cand\tcost\tcompl\.\tinv\.expr\.\tinv\.vars\n(.*\n)*Selected IV set for loop 3" "ivopts" { target { mips64-r6-linux-gnu } } } } */
test_ivopts_bug_after_patch.s
Description: test_ivopts_bug_after_patch.s
test_ivopts_bug_before_patch.s
Description: test_ivopts_bug_before_patch.s
tree-ivopts-details-before.c.188t.ivopts
Description: tree-ivopts-details-before.c.188t.ivopts
tree-ivopts-details-after.c.188t.ivopts
Description: tree-ivopts-details-after.c.188t.ivopts