On Wed, Sep 25, 2024 at 5:32 PM Aleksandar Rakic <aleksandar.ra...@htecgroup.com> wrote: > > Hi, > > I think I managed to fix indentation from the previous version. > > When comparing the tables showing the candidates for the group 1 before > and after applying this patch, it can be observed that complexities for > the candidates where the computation depends on the invariant > expressions or the invariant variables should be at least one, which > aligns with the approach used in the commit c2b64ce.
Commit c2b64ce is +2017-05-11 Bin Cheng <bin.ch...@arm.com> + + * tree-ssa-address.c (struct mem_address): Move to header file. + (valid_mem_ref_p, move_fixed_address_to_symbol): Make it global. + * tree-ssa-address.h (struct mem_address): Move from C file. + (valid_mem_ref_p, move_fixed_address_to_symbol): Declare. that hasn't any "approach" to complexity, it just makes a function global. > > ===== Before this patch ===== > Group 1: > cand cost compl. inv.expr. inv.vars > 1 11 0 5; NIL; > 2 11 0 6; NIL; > 4 8 0 7; NIL; > 5 9 0 8; NIL; > 6 1 0 NIL; NIL; > 7 1 1 NIL; NIL; > 9 7 0 5; NIL; > ===== Before this patch ===== > ===== After this patch ===== > Group 1: > cand cost compl. inv.expr. inv.vars > 1 11 2 4; NIL; why does complexity go up to 2 from 0 here? > 2 11 1 4; NIL; > 4 8 1 5; NIL; > 5 8 2 6; NIL; Likewise, and why does cost change? > 6 1 0 NIL; NIL; > 7 1 1 NIL; NIL; > 9 7 2 4; NIL; Likewise. This comparison is probably not very useful without showing the actual candidate and its uses? The above before/after figures do not match the testcase ontop of trunk. > ===== After this patch ===== > > Hence, if the invariant expressions or the invariant variables are used > when representing use with candidate, the complexity should be larger > for more complex expressions, so it is incremented by one. I am not sure > whether inv_present could be expressed as parts. The testcase looks mips specific - it has just scan-tree-dump-not which is probably easily confused to PASS when it regressed. Can you instead add a gcc.target/mips/ testcase that scans for actual assembler features? If the testcase relies on inlining daxpy then declaring that static helps that you just get dgefa in the final assembly. If you want to scan both functions I suggest to split the testcase into two to make it more reliable. I see r15-5650-gd9c908b7503965 for a --target=mips64-r6-linux-gnu generates for the innermost loop of dgefa .L12: addu $3,$9,$2 addu $3,$3,$8 lwc1 $f1,0($3) lwc1 $f0,0($2) addiu $7,$7,1 mul.s $f1,$f2,$f1 addiu $2,$2,4 slt $3,$7,$10 add.s $f0,$f0,$f1 .set noreorder .set nomacro bne $3,$0,.L12 and with the patch .L12: addu $3,$9,$2 addu $3,$3,$8 lwc1 $f1,0($3) lwc1 $f0,0($2) addiu $7,$7,1 mul.s $f1,$f2,$f1 addiu $2,$2,4 slt $3,$7,$10 add.s $f0,$f0,$f1 .set noreorder .set nomacro bne $3,$0,.L12 that's suspiciously identical?! In fact the whole testcase generates identical code. So besides not being able to see the actual problem (maybe I need some -march/-mtune?) the actual issue I have with the patch is that aff_inv is tried to be distributed to other components and for parts that fail to be distributed we cost it via if (comp_inv != NULL_TREE) cost = force_var_cost (data, comp_inv, inv_vars); simply ensuring the complexity is at least one would have been to change that to if (comp_inv != NULL_TREE) { cost = force_var_cost (data, comp_inv, inv_vars); /* Ensure complexity is at least one. */ cost.complexity = MAX (1, cost.complexity); } or alternatively just do that for the if (comp_inv && inv_expr && !simple_inv) case (it's a bit odd we adjust cost there only for 'inv_expr != NULL'). The patch you posted instead of just adjusting complexity seems to change the way we distribute the invariant - in particular we now distribute it to parts.offset even when that is not supported (!(ok_with_ratio_p || ok_without_ratio_p)), that's an odd change. complexity is supposed to be a tie-breaker, so I think that having bigger complexity for when we can't move it fully to index is OK - in the end any part that cannot be moved will end up being applied to base I think (that we have essentially two functions for this, one for costing and one for actual code emission, is a bit unfortunate). In the end I can't ack this patch as I cannot reproduce an effect on the testcase you added and because of the odd change part of the patch which is clearly not only doing what it's description says. Thanks, Richard. > Regards, > Aleksandar