https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87455
--- Comment #3 from Fanael <fanael4 at gmail dot com> --- > May be we should remove xorps generation part. If it were up to me, I'd keep to for BDVER[1234] only, because xorps is still one byte shorted than either xorpd or pxor and is as fast there, and introduce a separate tune option for untyped vector *moves* specifically, which would apply to BD, but also Zen, Pentium M, Core, Skylake (but not anything in between, i.e. Nehalem to Broadwell (though my data on Ivy Bridge, Haswell and Broadwell is not conclusive)) and other µarches where register-to-register vector moves are renamed (as in Zen), untyped (as in Skylake) or always of the same type (as in Core).