https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89071
H.J. Lu <hjl.tools at gmail dot com> changed:
What |Removed |Added
----------------------------------------------------------------------------
Depends on| |87007
--- Comment #1 from H.J. Lu <hjl.tools at gmail dot com> ---
vcvtsd2ss %xmm1, %xmm1, %xmm0
is faster than
vcvtsd2ss %xmm1, %xmm0, %xmm0
But
vxorps %xmm0, %xmm0, %xmm0
vcvtsd2ss %xmm1, %xmm0, %xmm0
are faster than both. I have a patch for PR 87007:
https://gcc.gnu.org/ml/gcc-patches/2019-01/msg00298.html
which inserts a vxorps at the last possible position. vxorps
will be executed only once in a function.
Referenced Bugs:
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87007
[Bug 87007] [8/9 Regression] 10% slowdown with -march=skylake-avx512