https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97875
--- Comment #5 from Christophe Lyon <clyon at gcc dot gnu.org> ---
Interestingly, if I make arm_builtin_support_vector_misalignment() behave the
same for MVE and Neon, the generated code (with __restrict__) becomes:
test_vsub_i32:
@ args = 0, pretend = 0, frame = 16
@ frame_needed = 0, uses_anonymous_args = 0
@ link register save eliminated.
push {r4, r5, r6, r7, r8, r9, r10, fp} @ 61 [c=8 l=4]
*push_multi
ldrd r10, fp, [r1, #8] @ 75 [c=8 l=4] *thumb2_ldrd
ldrd r6, r7, [r2, #8] @ 76 [c=8 l=4] *thumb2_ldrd
ldr r4, [r2] @ 14 [c=12 l=4] *thumb2_movsi_vfp/5
ldr r8, [r1] @ 9 [c=12 l=4] *thumb2_movsi_vfp/6
ldr r9, [r1, #4] @ 10 [c=12 l=4] *thumb2_movsi_vfp/6
ldr r5, [r2, #4] @ 15 [c=12 l=4] *thumb2_movsi_vfp/5
vmov d6, r8, r9 @ v4si @ 35 [c=4 l=8] *mve_movv4si/1
vmov d7, r10, fp
vmov d4, r4, r5 @ v4si @ 36 [c=4 l=8] *mve_movv4si/1
vmov d5, r6, r7
sub sp, sp, #16 @ 62 [c=4 l=4] *arm_addsi3/11
mov r3, sp @ 37 [c=4 l=2] *thumb2_movsi_vfp/0
vsub.i32 q3, q3, q2 @ 18 [c=80 l=4] mve_vsubqv4si
vstrw.32 q3, [r3] @ 34 [c=4 l=4] *mve_movv4si/7
ldrd r4, r1, [sp] @ 77 [c=8 l=4] *thumb2_ldrd_base
ldrd r2, r3, [sp, #8] @ 78 [c=8 l=4] *thumb2_ldrd
strd r4, r1, [r0] @ 79 [c=8 l=4] *thumb2_strd_base
strd r2, r3, [r0, #8] @ 80 [c=8 l=4] *thumb2_strd
add sp, sp, #16 @ 66 [c=4 l=4] *arm_addsi3/5
@ sp needed @ 67 [c=8 l=0] force_register_use
pop {r4, r5, r6, r7, r8, r9, r10, fp} @ 68 [c=8 l=4]
*load_multiple_with_writeback
bx lr @ 69 [c=8 l=4] *thumb2_return
The Neon version has:
vld1.32 {q8}, [r1] @ 8 [c=8 l=4] *movmisalignv4si_neon_load
vld1.32 {q9}, [r2] @ 9 [c=8 l=4] *movmisalignv4si_neon_load
vsub.i32 q8, q8, q9 @ 10 [c=80 l=4] *subv4si3_neon
vst1.32 {q8}, [r0] @ 11 [c=8 l=4] *movmisalignv4si_neon_store
bx lr @ 21 [c=8 l=4] *thumb2_return
So it seems MVE needs movmisalign pattern.