On 8/22/25 3:25 AM, yes wrote:
From: "Cui, Lili" <lili....@intel.com>
Hi,
This patch aims to remove issue code in improve_allocation that was causing
expensive allocno overflows.
Bootstrapped and regtested on x86_64-pc-linux-gnu, OK for trunk?
Yes, you can commit it into the trunk with minor change (see ChangeLog
entry below). Thank you for the patch.
And sorry for the delay with the review. When a RA patch changes
heuristics (in this case it also modifies code which was added to
improve exchange2 code), I usually do benchmarking too. I did it on
Intel CPU (i3-13600K). I confirm the improvements are visible. There
is moderate code size increases only on bwaves (0.37%) but on average
the code size decreases (the biggest decrease 0.41% was achieved on
exchange2).
I don't see any performance decrease (only some noise) but I see visible
performance improvements on perlbench and exchange2 (+4%) and on wrf
(+3%) which results in 1% SPECInt2017 improvement and 0.25% on
SPECFP2017. I will be not surprised if the patch can result in some
failures on some targets on tests which expect a particular code
generation (unfortunately we have many such over constraint tests). But
it is ok and if it happens it can be addressed separately. The most
credible RA test at least for me is always SPEC.
Again thank you for improving RA.
The original intention of this code was to allow more allocnos
to share the same register, but this led to expensive allocno
overflows. Extracted a small case (a bit large, see Bugzilla
PR117838 for details) from 548.exchange2_r to analyze this
register allocation issue.
Before improve_allocation function:
a537 (cost 1896, reg42)
a20 (cost 270, reg1)
a13 (cost 144, spill)
a551 (cost 70, reg40)
a5 (cost 43, spill)
a493 (cost 30, reg42)
a499 (cost 12, reg40)
------------------------------
Dump info in improve_allocation function:
Base:
Spilling a493r125 for a5r113
Spilling a573r202 for a5r113
Spilling a499r248 for a13r106
Spilling a551r120 for a13r106
Spilling a20r237 for a551r120
With patch:
Spilling a499r248 for a13r106
Spilling a551r120 for a13r106
Spilling a493r125 for a551r120
------------------------------
After assign_hard_reg (at the end of improve_allocation):
Base:
a537 (cost 1896, reg1)
a20 (cost 270, spill) -----> This is unreasonable
a13 (cost 144, reg40)
a551 (cost 70, reg1)
a5 (cost 43, reg42)
a493 (cost 30, spill)
a499 (cost 12, reg1)
With patch:
a537 (cost 1896, reg42)
a20 (cost 270, reg1)
a13 (cost 144, reg40)
a551 (cost 70, reg42)
a5 (cost 43, spill)
a493 (cost 30, spill)
a499 (cost 12, reg42)
-----------------------------
Collected spec2017 performance on Znver3/Graviton4/EMR/SRF for O2 and Ofast.
No performance regression was observed.
FOR multi-copy O2
SRF: 548.exchange2_r increased by 7.5%, 500.perlbench_r increased by 2.0%.
EMR: 548.exchange2_r increased by 4.5%, 500.perlbench_r increased by 1.7%.
Graviton4: 548.exchange2_r Increased by 2.2%, 511.povray_r increased by 2.8%.
Znver3 : 500.perlbench_r increased by 2.0%.
gcc/ChangeLog:
PR rtl-optimization/117838
* ira-color.cc (improve_allocation): Remove the issue code.
It is too ambiguous. Please change it to "Remove soft conflict related
code".
---
gcc/ira-color.cc | 41 +++++++++++------------------------------
1 file changed, 11 insertions(+), 30 deletions(-)
diff --git a/gcc/ira-color.cc b/gcc/ira-color.cc
index 4b9296029cc..fa2ea61cadf 100644
--- a/gcc/ira-color.cc
+++ b/gcc/ira-color.cc
@@ -3304,8 +3304,6 @@ improve_allocation (void)
assigning hard register to allocno A even without spilling
conflicting allocnos. */
continue;
- auto_bitmap allocnos_to_spill;
- HARD_REG_SET soft_conflict_regs = {};
mode = ALLOCNO_MODE (a);
nwords = ALLOCNO_NUM_OBJECTS (a);
/* Process each allocno conflicting with A and update the cost
@@ -3331,40 +3329,24 @@ improve_allocation (void)
ALLOCNO_COLOR_DATA (conflict_a)->temp = check;
if ((conflict_hregno = ALLOCNO_HARD_REGNO (conflict_a)) < 0)
continue;
- auto spill_a = ira_soft_conflict (a, conflict_a);
- if (spill_a)
- {
- if (!bitmap_set_bit (allocnos_to_spill,
- ALLOCNO_NUM (spill_a)))
- continue;
- ira_loop_border_costs border_costs (spill_a);
- spill_cost = border_costs.spill_inside_loop_cost ();
- }
+ spill_cost = ALLOCNO_UPDATED_MEMORY_COST (conflict_a);
+ k = (ira_class_hard_reg_index
+ [ALLOCNO_CLASS (conflict_a)][conflict_hregno]);
+ ira_assert (k >= 0);
+ if ((allocno_costs = ALLOCNO_HARD_REG_COSTS (conflict_a))
+ != NULL)
+ spill_cost -= allocno_costs[k];
else
- {
- spill_cost = ALLOCNO_UPDATED_MEMORY_COST (conflict_a);
- k = (ira_class_hard_reg_index
- [ALLOCNO_CLASS (conflict_a)][conflict_hregno]);
- ira_assert (k >= 0);
- if ((allocno_costs = ALLOCNO_HARD_REG_COSTS (conflict_a))
- != NULL)
- spill_cost -= allocno_costs[k];
- else
- spill_cost -= ALLOCNO_UPDATED_CLASS_COST (conflict_a);
- spill_cost
- += allocno_copy_cost_saving (conflict_a, conflict_hregno);
- }
+ spill_cost -= ALLOCNO_UPDATED_CLASS_COST (conflict_a);
+ spill_cost
+ += allocno_copy_cost_saving (conflict_a, conflict_hregno);
conflict_nregs = hard_regno_nregs (conflict_hregno,
ALLOCNO_MODE (conflict_a));
auto note_conflict = [&](int r)
{
if (check_hard_reg_p (a, r,
conflicting_regs, profitable_hard_regs))
- {
- if (spill_a)
- SET_HARD_REG_BIT (soft_conflict_regs, r);
- costs[r] += spill_cost;
- }
+ costs[r] += spill_cost;
};
for (r = conflict_hregno;
r >= 0 && (int) end_hard_regno (mode, r) > conflict_hregno;
@@ -3396,7 +3378,6 @@ improve_allocation (void)
by spilling some conflicting allocnos does not improve the
allocation cost. */
continue;
- spill_soft_conflicts (a, allocnos_to_spill, soft_conflict_regs, best);
nregs = hard_regno_nregs (best, mode);
/* Now spill conflicting allocnos which contain a hard register
of A when we assign the best chosen hard register to it. */