On 10/3/23 10:07, Surya Kumari Jangala wrote:
ira: Scale save/restore costs of callee save registers with block frequency In assign_hard_reg(), when computing the costs of the hard registers, the cost of saving/restoring a callee-save hard register in prolog/epilog is taken into consideration. However, this cost is not scaled with the entry block frequency. Without scaling, the cost of saving/restoring is quite small and this can result in a callee-save register being chosen by assign_hard_reg() even though there are free caller-save registers available. Assigning a callee save register to a pseudo that is live in the entire function and across a call will cause shrink wrap to fail.
Thank you for addressing this part of code. Sometimes changes looking obvious have unpredicted results. I remember experimenting with different heuristics for this code long time ago when 32-bit x86 target was the major one and this was the best variant I found. Since a lot of changes happened since then, I decided to benchmark your change.
This change is increasing x86-64 spec2017 code size by 0.67% in average. The increase is very stable for 20 spec2017 benchmarks. Only code for bwaves is smaller (by 0.01%). The specfp2017 performance is the same. There is one positive impact, specin2017 improved by 0.6% (8.59 vs 8.54) mainly because of improvement of xalamcbmk (2.5%) and exchange (5%).
So I propose to make this change only when it is not an optimization for the code size. Also please be prepared that there might be testsuite failures on other targets: some targets are overconstrained by tests expecting specific generated code.
2023-10-03 Surya Kumari Jangala <jskum...@linux.ibm.com> gcc/ PR rtl-optimization/111673 * ira-color.cc (assign_hard_reg): Scale save/restore costs of callee save registers with block frequency. gcc/testsuite/ PR rtl-optimization/111673 * gcc.target/powerpc/pr111673/c: New test. --- diff --git a/gcc/ira-color.cc b/gcc/ira-color.cc index f2e8ea34152..eb20c52310d 100644 --- a/gcc/ira-color.cc +++ b/gcc/ira-color.cc @@ -2175,7 +2175,8 @@ assign_hard_reg (ira_allocno_t a, bool retry_p) add_cost = ((ira_memory_move_cost[mode][rclass][0] + ira_memory_move_cost[mode][rclass][1]) * saved_nregs / hard_regno_nregs (hard_regno, - mode) - 1); + mode) - 1) + * REG_FREQ_FROM_BB (ENTRY_BLOCK_PTR_FOR_FN (cfun)); cost += add_cost; full_cost += add_cost; } diff --git a/gcc/testsuite/gcc.target/powerpc/pr111673.c b/gcc/testsuite/gcc.target/powerpc/pr111673.c new file mode 100644 index 00000000000..e0c0f85460a --- /dev/null +++ b/gcc/testsuite/gcc.target/powerpc/pr111673.c @@ -0,0 +1,17 @@ +/* { dg-do compile { target lp64 } } */ +/* { dg-options "-O2 -fdump-rtl-pro_and_epilogue" } */ + +/* Verify there is an early return without the prolog and shrink-wrap + the function. */ + +int f (int); +int +advance (int dz) +{ + if (dz > 0) + return (dz + dz) * dz; + else + return dz * f (dz); +} + +/* { dg-final { scan-rtl-dump-times "Performing shrink-wrapping" 1 "pro_and_epilogue" } } */