https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102178

--- Comment #12 from Richard Biener <rguenth at gcc dot gnu.org> ---
(In reply to Hongtao.liu from comment #10)
> (In reply to Richard Biener from comment #8)
> > So w/ -Ofast -march=znver2 I get a runtime of 130 seconds, when I add
> > -mtune-ctrl=^inter_unit_moves_from_vec,^inter_unit_moves_to_vec then
> > this improves to 114 seconds, with sink2 disabled I get 108 seconds
> > and with the tune-ctrl ontop I get 113 seconds.
> > 
> > Note that Zen2 is quite special in that it has the ability to handle
> > load/store from the stack by mapping it to a register, effectively
> > making them zero latency (zen3 lost this ability).
> > 
> > So while moves between GPRs and XMM might not be bad anymore _spilling_
> > to a GPR (and I suppose XMM, too) is still a bad idea and the stack
> > should be preferred.
> > 
> 
> According to znver2_cost
> 
> Cost of sse_to_integer is a little bit less than fp_store, maybe increase
> sse_to_integer cost(more than fp_store) can helps RA to choose memory
> instead of GPR.

That sounds reasonable - GPR<->xmm is cheaper than GPR -> stack -> xmm
but GPR<->xmm should be more expensive than GPR/xmm<->stack.  As said above
Zen2 can do reg -> mem, mem -> reg via renaming if 'mem' is somewhat special,
but modeling that doesn't seem to be necessary.

We seem to have store costs of 8 and load costs of 6, I'll try bumping the
gpr<->xmm move cost to 8.

Reply via email to