https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102178
Richard Biener <rguenth at gcc dot gnu.org> changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |hubicka at gcc dot gnu.org --- Comment #8 from Richard Biener <rguenth at gcc dot gnu.org> --- So w/ -Ofast -march=znver2 I get a runtime of 130 seconds, when I add -mtune-ctrl=^inter_unit_moves_from_vec,^inter_unit_moves_to_vec then this improves to 114 seconds, with sink2 disabled I get 108 seconds and with the tune-ctrl ontop I get 113 seconds. Note that Zen2 is quite special in that it has the ability to handle load/store from the stack by mapping it to a register, effectively making them zero latency (zen3 lost this ability). So while moves between GPRs and XMM might not be bad anymore _spilling_ to a GPR (and I suppose XMM, too) is still a bad idea and the stack should be preferred. Not sure if it's possible to do that though. Doing the same experiment as above on a Zen3 machine would be nice, too.