https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90204
--- Comment #9 from Hongtao.liu <crazylht at gmail dot com> --- Also what's better between aligned load/store of smaller size VS unaligned load/store of bigger size? aligned load/store of smaller size: movq %rdx, (%rdi) movq -56(%rsp), %rdx movq %rdx, 8(%rdi) movq -48(%rsp), %rdx movq %rdx, 16(%rdi) movq -40(%rsp), %rdx movq %rdx, 24(%rdi) vmovq %xmm0, 32(%rax) movq -24(%rsp), %rdx movq %rdx, 40(%rdi) movq -16(%rsp), %rdx movq %rdx, 48(%rdi) movq -8(%rsp), %rdx movq %rdx, 56(%rdi) unaligned load/store of bigger size: vmovups %xmm2, (%rdi) vmovups %xmm3, 16(%rdi) vmovups %xmm4, 32(%rdi) vmovups %xmm5, 48(%rdi)