Hi, I'm investigating various register allocation inefficiencies. The first thing that stands out is that GCC both supports caller-saves as well as spilling. Spilling seems to spill all definitions and all uses of a liverange. This means you often end up with multiple reloads close together, while it would be more efficient to do a single load and then reuse the loaded value several times. Caller-save does better in that case, but it is inefficient in that it repeatedly stores registers across every call even if unchanged. If both were fixed to minimise the number of loads/stores I can't see how one could beat the other, so you'd no longer need both.
Anyway due to the current implementation there are clearly cases where caller-save is best and cases where spilling is best. However I do not see it making the correct decision despite trying to account for the costs - some code is significantly faster with -fno-caller-saves, other code wins with -fcaller-saves. As an example, I see code like this on AArch64: ldr s4, .LC20 fmul s0, s0, s4 str s4, [x29, 104] bl f ldr s4, [x29, 104] fmul s0, s0, s4 With -fno-caller-saves it spills and rematerializes the constant as you'd expect: ldr s1, .LC20 fmul s0, s0, s1 bl f ldr s5, .LC20 fmul s0, s0, s5 So given this, is the cost calculation correct and does it include rematerialization? The spill code understands how to rematerialize so it should take this into account in the costs. I did find some code in ira-costs.c in scan_one_insn() that attempts something that looks like an adjustment for rematerialization but it doesn't appear to handle all cases (simple immediates, 2-instruction immediates, address-constants, non-aliased loads such as literal pool and const data loads). Also the hook CALLER_SAVE_PROFITABLE appears to have disappeared - overall performance improves significantly if I add this (basically the default heuristic used on instruction frequencies): --- a/gcc/ira-costs.c +++ b/gcc/ira-costs.c @@ -2230,6 +2230,8 @@ ira_tune_allocno_costs (void) * ALLOCNO_FREQ (a) * IRA_HARD_REGNO_ADD_COST_MULTIPLIER (regno) / 2); #endif + if (ALLOCNO_FREQ (a) < 4 * ALLOCNO_CALL_FREQ (a)) + cost = INT_MAX; } if (INT_MAX - cost < reg_costs[j]) reg_costs[j] = INT_MAX; If such a simple heuristic can beat the costs, they can't be quite right. Is there anyone who understands the cost calculations? Wilco