http://gcc.gnu.org/bugzilla/show_bug.cgi?id=51980
ktkachov at gcc dot gnu.org changed:
What |Removed |Added
----------------------------------------------------------------------------
CC| |ktkachov at gcc dot gnu.org
--- Comment #8 from ktkachov at gcc dot gnu.org ---
> arm-none-eabi-gcc -march=armv7-a -mfpu=neon -mfloat-abi=softfp -O2 -mthumb:
> sqrlen4D_16u8:
> vmov d18, r0, r1 @ v16qi
> vmov d19, r2, r3
> vld1.64 {d16-d17}, [sp:64]
> vabd.u8 q8, q9, q8
> vmull.u8 q9, d16, d16
> vmull.u8 q8, d17, d17
> vuzp.32 q9, q8
> vpaddl.u16 q9, q9
> vmov q10, q9 @ v4si
> vpadal.u16 q10, q8
> vmov r0, r1, d20 @ v4si
> vmov r2, r3, d21
> bx lr
With current trunk I'm getting for the softfp case:
push {lr} @ 40 *push_multi [length = 2]
vmov d16, r0, r1 @ v16qi @ 37 *neon_movv16qi/6 [length
= 8]
vmov d17, r2, r3
add lr, sp, #4 @ 36 *arm_addsi3/5 [length = 4]
vldr d18, [sp, #4] @ 3 *neon_movv16qi/4 [length = 8]
vldr d19, [sp, #12]
vabd.u8 q9, q8, q9 @ 7 neon_vabdv16qi [length = 4]
vmull.u8 q8, d18, d18 @ 14 neon_vmullv8qi [length = 4]
vmull.u8 q9, d19, d19 @ 16 neon_vmullv8qi [length = 4]
vuzp.32 q8, q9 @ 18 *neon_vuzpv4si_insn [length = 4]
vpaddl.u16 q8, q8 @ 22 neon_vpaddlv8hi [length = 4]
vpadal.u16 q8, q9 @ 28 neon_vpadalv8hi [length = 4]
vmov r0, r1, d16 @ v4si @ 39 *neon_movv4si/5 [length = 8]
vmov r2, r3, d17
ldr pc, [sp], #4 @ 45 *ldr_with_return [length = 4]
The move between the vpad*s is gone, but there's a couple of redundant loads
and some register spillage.