https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102178

Richard Biener <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |hubicka at gcc dot gnu.org

--- Comment #8 from Richard Biener <rguenth at gcc dot gnu.org> ---
So w/ -Ofast -march=znver2 I get a runtime of 130 seconds, when I add
-mtune-ctrl=^inter_unit_moves_from_vec,^inter_unit_moves_to_vec then
this improves to 114 seconds, with sink2 disabled I get 108 seconds
and with the tune-ctrl ontop I get 113 seconds.

Note that Zen2 is quite special in that it has the ability to handle
load/store from the stack by mapping it to a register, effectively
making them zero latency (zen3 lost this ability).

So while moves between GPRs and XMM might not be bad anymore _spilling_
to a GPR (and I suppose XMM, too) is still a bad idea and the stack
should be preferred.

Not sure if it's possible to do that though.

Doing the same experiment as above on a Zen3 machine would be nice, too.

Reply via email to