This was posted towards the end of stage 3, a few days before stage 4
started. Is it now too late to "ping" ?
--Alan
Alan Lawrence wrote:
Nowadays, just storing the (bigendian-corrected) vector element to the address,
generates exactly the same assembler for all cases except
{float,int,uint}64x1_t, where
st1 {v0.d}[0], [x0]
becomes
str d0, [x0]
This is not a problem, and the change will be much better for optimization
through the midend, as well as making use of previous improvements in error
reporting.
Also move the /* vst1q */ comment, which was a couple intrinsics too late.
gcc/ChangeLog:
* config/aarch64/arm_neon.h (vst1_lane_f32, vst1_lane_f64,
vst1_lane_p8, vst1_lane_p16, vst1_lane_s8, vst1_lane_s16,
vst1_lane_s32, vst1_lane_s64, vst1_lane_u8, vst1_lane_u16,
vst1_lane_u32, vst1_lane_u64, vst1q_lane_f32, vst1q_lane_f64,
vst1q_lane_p8, vst1q_lane_p16, vst1q_lane_s8, vst1q_lane_s16,
vst1q_lane_s32, vst1q_lane_s64, vst1q_lane_u8, vst1q_lane_u16,
vst1q_lane_u32, vst1q_lane_u64): Reimplement with pointer dereference
and __aarch64_vget_lane_any.
Cross-tested check-gcc on aarch64-none-elf and aarch64_be-none-elf.
Ok for trunk?
Cheers, Alan