https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102178
--- Comment #12 from Richard Biener <rguenth at gcc dot gnu.org> --- (In reply to Hongtao.liu from comment #10) > (In reply to Richard Biener from comment #8) > > So w/ -Ofast -march=znver2 I get a runtime of 130 seconds, when I add > > -mtune-ctrl=^inter_unit_moves_from_vec,^inter_unit_moves_to_vec then > > this improves to 114 seconds, with sink2 disabled I get 108 seconds > > and with the tune-ctrl ontop I get 113 seconds. > > > > Note that Zen2 is quite special in that it has the ability to handle > > load/store from the stack by mapping it to a register, effectively > > making them zero latency (zen3 lost this ability). > > > > So while moves between GPRs and XMM might not be bad anymore _spilling_ > > to a GPR (and I suppose XMM, too) is still a bad idea and the stack > > should be preferred. > > > > According to znver2_cost > > Cost of sse_to_integer is a little bit less than fp_store, maybe increase > sse_to_integer cost(more than fp_store) can helps RA to choose memory > instead of GPR. That sounds reasonable - GPR<->xmm is cheaper than GPR -> stack -> xmm but GPR<->xmm should be more expensive than GPR/xmm<->stack. As said above Zen2 can do reg -> mem, mem -> reg via renaming if 'mem' is somewhat special, but modeling that doesn't seem to be necessary. We seem to have store costs of 8 and load costs of 6, I'll try bumping the gpr<->xmm move cost to 8.