https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97875
--- Comment #5 from Christophe Lyon <clyon at gcc dot gnu.org> --- Interestingly, if I make arm_builtin_support_vector_misalignment() behave the same for MVE and Neon, the generated code (with __restrict__) becomes: test_vsub_i32: @ args = 0, pretend = 0, frame = 16 @ frame_needed = 0, uses_anonymous_args = 0 @ link register save eliminated. push {r4, r5, r6, r7, r8, r9, r10, fp} @ 61 [c=8 l=4] *push_multi ldrd r10, fp, [r1, #8] @ 75 [c=8 l=4] *thumb2_ldrd ldrd r6, r7, [r2, #8] @ 76 [c=8 l=4] *thumb2_ldrd ldr r4, [r2] @ 14 [c=12 l=4] *thumb2_movsi_vfp/5 ldr r8, [r1] @ 9 [c=12 l=4] *thumb2_movsi_vfp/6 ldr r9, [r1, #4] @ 10 [c=12 l=4] *thumb2_movsi_vfp/6 ldr r5, [r2, #4] @ 15 [c=12 l=4] *thumb2_movsi_vfp/5 vmov d6, r8, r9 @ v4si @ 35 [c=4 l=8] *mve_movv4si/1 vmov d7, r10, fp vmov d4, r4, r5 @ v4si @ 36 [c=4 l=8] *mve_movv4si/1 vmov d5, r6, r7 sub sp, sp, #16 @ 62 [c=4 l=4] *arm_addsi3/11 mov r3, sp @ 37 [c=4 l=2] *thumb2_movsi_vfp/0 vsub.i32 q3, q3, q2 @ 18 [c=80 l=4] mve_vsubqv4si vstrw.32 q3, [r3] @ 34 [c=4 l=4] *mve_movv4si/7 ldrd r4, r1, [sp] @ 77 [c=8 l=4] *thumb2_ldrd_base ldrd r2, r3, [sp, #8] @ 78 [c=8 l=4] *thumb2_ldrd strd r4, r1, [r0] @ 79 [c=8 l=4] *thumb2_strd_base strd r2, r3, [r0, #8] @ 80 [c=8 l=4] *thumb2_strd add sp, sp, #16 @ 66 [c=4 l=4] *arm_addsi3/5 @ sp needed @ 67 [c=8 l=0] force_register_use pop {r4, r5, r6, r7, r8, r9, r10, fp} @ 68 [c=8 l=4] *load_multiple_with_writeback bx lr @ 69 [c=8 l=4] *thumb2_return The Neon version has: vld1.32 {q8}, [r1] @ 8 [c=8 l=4] *movmisalignv4si_neon_load vld1.32 {q9}, [r2] @ 9 [c=8 l=4] *movmisalignv4si_neon_load vsub.i32 q8, q8, q9 @ 10 [c=80 l=4] *subv4si3_neon vst1.32 {q8}, [r0] @ 11 [c=8 l=4] *movmisalignv4si_neon_store bx lr @ 21 [c=8 l=4] *thumb2_return So it seems MVE needs movmisalign pattern.