https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80833
--- Comment #3 from Peter Cordes <peter at cordes dot ca> --- Atom's movd xmm->int is slower (lat=4, rtput=2) than its movd int->xmm (lat=3, rtput=1), which is opposite of every other CPU (except Silvermont where they're the same throughput but xmm->int is 1c slower). So very likely store/reload is the way to go for -mtune=atom, since store-forwarding is so amazingly fast (1c latency). But maybe with SSE4 pextrd, the code-size saving is worth it.