[Bug rtl-optimization/102178] [12 Regression] SPECFP 2006 470.lbm regressions on AMD Zen CPUs after r12-897-gde56f95afaaa22

rguenth at gcc dot gnu.org via Gcc-bugs Thu, 27 Jan 2022 01:34:47 -0800

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102178


--- Comment #12 from Richard Biener <rguenth at gcc dot gnu.org> ---
(In reply to Hongtao.liu from comment #10)
> (In reply to Richard Biener from comment #8)
> > So w/ -Ofast -march=znver2 I get a runtime of 130 seconds, when I add
> > -mtune-ctrl=^inter_unit_moves_from_vec,^inter_unit_moves_to_vec then
> > this improves to 114 seconds, with sink2 disabled I get 108 seconds
> > and with the tune-ctrl ontop I get 113 seconds.
> > 
> > Note that Zen2 is quite special in that it has the ability to handle
> > load/store from the stack by mapping it to a register, effectively
> > making them zero latency (zen3 lost this ability).
> > 
> > So while moves between GPRs and XMM might not be bad anymore _spilling_
> > to a GPR (and I suppose XMM, too) is still a bad idea and the stack
> > should be preferred.
> > 
> 
> According to znver2_cost
> 
> Cost of sse_to_integer is a little bit less than fp_store, maybe increase
> sse_to_integer cost(more than fp_store) can helps RA to choose memory
> instead of GPR.

That sounds reasonable - GPR<->xmm is cheaper than GPR -> stack -> xmm
but GPR<->xmm should be more expensive than GPR/xmm<->stack.  As said above
Zen2 can do reg -> mem, mem -> reg via renaming if 'mem' is somewhat special,
but modeling that doesn't seem to be necessary.

We seem to have store costs of 8 and load costs of 6, I'll try bumping the
gpr<->xmm move cost to 8.

[Bug rtl-optimization/102178] [12 Regression] SPECFP 2006 470.lbm regressions on AMD Zen CPUs after r12-897-gde56f95afaaa22

Reply via email to