Ah, light dawns (maybe). I guess the problems stem from the attempts to combine Neon with ARMv5. Neon shouldn't be used with anything prior to ARMv7, since that's the earliest version of the architecture that can support it.
I guess that what is happening is that we see we have Neon, so start to generate a Neon-based copy sequence, but then notice that we don't have misaligned access (something that must exist if we have Neon) and generate VLDR instructions in a mistaken attempt to work around the first inconsistency. Maybe we should tie -mfpu=neon to having at least ARMv7 (though ARMv6 also has misaligned access support). R. On 28/05/14 00:03, Maciej W. Rozycki wrote: > On Tue, 27 May 2014, Kyrill Tkachov wrote: > >>>> This change however has regressed gcc.dg/vect/vect-72.c on the >>>> arm-linux-gnueabi target, -march=armv5te, in particular in 4.8. >>> And what are all the configure flags you are using in case some one >>> has to reproduce this issue ? >> >> Second that. My recently built 4.8 (gcc version 4.8.2 20130531) for vect-72 >> with options: >> -O2 -ftree-vectorize -march=armv5te -mfpu=neon -mfloat-abi=hard >> -fno-vect-cost-model -fno-common >> >> gives code the same as your original one: >> .L14: >> sub r1, r3, #16 >> add r3, r3, #16 >> vld1.8 {q8}, [r1] >> cmp r3, r0 >> vst1.64 {d16-d17}, [r2:64]! >> bne .L14 >> ldr r3, .L22+12 >> add ip, r3, #128 >> add r2, r3, #129 > > I have this: > > .L14: > sub r1, r3, #16 @ 130 *arm_addsi3/7 [length = 4] > add r3, r3, #16 @ 135 *arm_addsi3/2 [length = 4] > vld1.8 {q8}, [r1] @ 131 *movmisalignv16qi_neon_load [length > = 4] > cmp r3, r0 @ 136 *arm_cmpsi_insn/3 [length = 4] > vst1.64 {d16-d17}, [r2:64]! @ 133 *neon_movv16qi/2 [length > = 8] > bne .L14 @ 137 arm_cond_branch [length = 4] > > without and this: > > .L14: > vldr d16, [r3, #-16] @ 130 *neon_movv16qi/4 [length = 8] > vldr d17, [r3, #-8] > add r3, r3, #16 @ 133 *arm_addsi3/2 [length = 4] > cmp r3, r1 @ 134 *arm_cmpsi_insn/3 [length = 4] > vst1.64 {d16-d17}, [r2:64]! @ 131 *neon_movv16qi/2 [length > = 8] > bne .L14 @ 135 arm_cond_branch [length = 4] > > with your change applied respectively so clearly it's making its intended > effect of disabling the use of movmisalignv16qi_neon_load. > > However VLD1.8 can also be produced from other RTL patterns, or maybe you > don't have `unaligned_access' set to zero for some reason, which you > should (for ARMv5TE) according to this gcc/config/arm/arm.c piece: > > /* Enable -munaligned-access by default for > - all ARMv6 architecture-based processors > - ARMv7-A, ARMv7-R, and ARMv7-M architecture-based processors. > - ARMv8 architecture-base processors. > > Disable -munaligned-access by default for > - all pre-ARMv6 architecture-based processors > - ARMv6-M architecture-based processors. */ > > if (unaligned_access == 2) > { > if (arm_arch6 && (arm_arch_notm || arm_arch7)) > unaligned_access = 1; > else > unaligned_access = 0; > } > else if (unaligned_access == 1 > && !(arm_arch6 && (arm_arch_notm || arm_arch7))) > { > warning (0, "target CPU does not support unaligned accesses"); > unaligned_access = 0; > } > > -- can you build vect-72.c with -dp and see which pattern VLD1.8 is > produced from in your case? > > As to the GCC configure options I have as I say nothing there beyond > making -march=armv5te the default (surely you're not interested in > --prefix, etc.). For the record the test framework adds these options on > top of that for this particular case: > > -fno-diagnostics-show-caret -mfpu=neon -mfloat-abi=softfp -ffast-math > -ftree-vectorize -fno-vect-cost-model -fno-common -O2 > -fdump-tree-vect-details > > but these should be no different in your case (except perhaps from > `-mfloat-abi=', but that shouldn't matter as this is no FP code). > > I have double-checked with current (r210984) 4.8 now and the issue is > still there. > > Maciej >