This was posted towards the end of stage 3, a few days before stage 4
started. Is it now too late to "ping" ?

--Alan


Alan Lawrence wrote:
Nowadays, just storing the (bigendian-corrected) vector element to the address, generates exactly the same assembler for all cases except {float,int,uint}64x1_t, where
st1 {v0.d}[0], [x0]
becomes
str d0, [x0]

This is not a problem, and the change will be much better for optimization through the midend, as well as making use of previous improvements in error reporting.

Also move the /* vst1q */ comment, which was a couple intrinsics too late.

gcc/ChangeLog:

        * config/aarch64/arm_neon.h (vst1_lane_f32, vst1_lane_f64,
        vst1_lane_p8, vst1_lane_p16, vst1_lane_s8, vst1_lane_s16,
        vst1_lane_s32, vst1_lane_s64, vst1_lane_u8, vst1_lane_u16,
        vst1_lane_u32, vst1_lane_u64, vst1q_lane_f32, vst1q_lane_f64,
        vst1q_lane_p8, vst1q_lane_p16, vst1q_lane_s8, vst1q_lane_s16,
        vst1q_lane_s32, vst1q_lane_s64, vst1q_lane_u8, vst1q_lane_u16,
        vst1q_lane_u32, vst1q_lane_u64): Reimplement with pointer dereference
        and __aarch64_vget_lane_any.

Cross-tested check-gcc on aarch64-none-elf and aarch64_be-none-elf.

Ok for trunk?

Cheers, Alan


Reply via email to