https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66925
--- Comment #2 from H.J. Lu <hjl.tools at gmail dot com> --- (In reply to Andrew Pinski from comment #1) > Some processors moving between the GPR via vmovd is slower than moving via > memory. So that is the reason why using -march=sandybridge or > -march=ivybridge makes the issue go away. You can also use -mtune=intel.