https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93005
--- Comment #4 from Joel Holdsworth <joel at airwebreathe dot org.uk> --- Results for clang and MSVC are similar: clang trunk: foo(__simd128_int32_t): push {r11, lr} mov r11, sp sub sp, sp, #24 bfc sp, #0, #4 mov r0, sp vst1.32 {d0, d1}, [r0] vld1.64 {d0, d1}, [r0:128] mov sp, r11 pop {r11, pc} ...but even though these other compilers don't do any better on ARM, I still think my original point stands.