On 2/6/25 5:35 PM, Jan Hubicka wrote:

Register 3 (first caller saved) has cost 11000.  This comes from:
             add_cost = ((ira_memory_move_cost[mode][rclass][0]
                          + ira_memory_move_cost[mode][rclass][1])
                         * saved_nregs / hard_regno_nregs (hard_regno,
                                                           mode) - 1)
                                                                  ^^
                                                                  here
                        * (optimize_size ? 1 :
                           REG_FREQ_FROM_BB (ENTRY_BLOCK_PTR_FOR_FN (cfun)));

There is no comment why -1, but I suppose it is there to biass costs to
use prologue/epilogue instad of caller save sequence when runtime cost
estimate is even.

It is a very old code.  As I remember RA w/o this avoided to use a callee-saved reg at all in cases when it could be used for more one pseudo.  The idea was also that some targets typical savings/restores are cheaper (pop/push, multiple reg lds/sts) and it should be taken into account somehow.



0000000000000000 <test>:
    0:  53                      push   %rbx             <--- callee save
    1:  89 fb                   mov    %edi,%ebx        <--- move 1
    3:  f7 db                   neg    %ebx
    5:  74 09                   je     10 <test+0x10>
    7:  e8 00 00 00 00          call   c <test+0xc>
    c:  89 d8                   mov    %ebx,%eax        <--- callee restore
    e:  5b                      pop    %rbx
    f:  c3                      ret
   10:  e8 00 00 00 00          call   15 <test+0x15>
   15:  89 d8                   mov    %ebx,%eax        <--- move 2
   17:  5b                      pop    %rbx             <--- callee restore
   18:  c3                      ret

Mainline used EAX since it has costs 13000.  It is not 100% clear to me
why.
In many cases it is hard to find why this particular cost occurs as the costs are updated dynamically and assignment a pseudo to hard reg can affect cost for another pseudo which is not involved in a move with the first pseudo (but involved through a chain of moves). Those are complicated heuristics changed several times and verified by visible SPEC performance improvements.

So overall I think
  1) we can fix scaling of epilogue by exit block frequency
     to get noreturns right.
  2) we should drop the check for optimize_size.  Since with -Os
     REG_FREQ_FROM_BB always returns 1000 everything should be scaled
     same way
  3) we currently have wire in "-1" to biass the cost metric for callee
     saved registers.
     It may make sense to allow targets to control this, since i.e. x86
     has push/pop that is shorter. -3 would solve the testcase with neg
     and would express that push/pop is still cheaper with extra reg-reg
     move.
  4) cost model misses shring wrapping, the fact that if register is
     callee saved it may be used by multiple allocnos and also that
     push/pop sequence may avoid need for manual RSP adjustments.
Shrink wrapping was later addition to RA and I guess nobody thought how to update cost model taking it into account.
     Those seems bit harder things to fit in though.

So if we want to go with the target hook, I think it should adjust the
cost before scalling (since targets may have special tricks for
prologues) rather than the scale factor (which is target independent
part of cost model).

Very nice analysis, Honza.  I believe we still need a hook and I'll work on the target hook improvement.


Reply via email to