On 25/10/2024 19:47, Christophe Lyon wrote: > From: Alfie Richards <alfie.richa...@arm.com> > > Implement the mve vld and vst intrinsics using the MVE builtins framework. > > The main part of the patch is to reimplement to vstr/vldr patterns > such that we now have much fewer of them: > - non-truncating stores > - predicated non-truncating stores > - truncating stores > - predicated truncating stores > - non-extending loads > - predicated non-extending loads > - extending loads > - predicated extending loads > > This enables us to update the implementation of vld1/vst1 and use the > new vldr/vstr builtins. > > The patch also adds support for the predicated vld1/vst1 versions. > > gcc.target/arm/pr112337.c needs an update, to call the intrinsic > instead of the builtin, which this patch deletes. > > 2024-09-11 Alfie Richards <alfie.richa...@arm.com> > Christophe Lyon <christophe.l...@arm.com> > > gcc/ > > * config/arm/arm-mve-builtins-base.cc (vld1q_impl): Add support > for predicated version. > (vst1q_impl): Likewise. > (vstrq_impl): New class. > (vldrq_impl): New class. > (vldrbq): New. > (vldrhq): New. > (vldrwq): New. > (vstrbq): New. > (vstrhq): New. > (vstrwq): New. > * config/arm/arm-mve-builtins-base.def (vld1q): Add predicated > version. > (vldrbq): New. > (vldrhq): New. > (vldrwq): New. > (vst1q): Add predicated version. > (vstrbq): New. > (vstrhq): New. > (vstrwq): New. > (vrev32q): Update types to float_16. > * config/arm/arm-mve-builtins-base.h (vldrbq): New. > (vldrhq): New. > (vldrwq): New. > (vstrbq): New. > (vstrhq): New. > (vstrwq): New. > * config/arm/arm-mve-builtins-functions.h (memory_vector_mode): > Remove conversion of floating point vectors to integer. > * config/arm/arm-mve-builtins.cc (TYPES_float16): Change to... > (TYPES_float_16): ...this. > (TYPES_float_32): New. > (float16): Change to... > (float_16): ...this. > (float_32): New. > (preds_z_or_none): New. > (function_resolver::check_gp_argument): Add support for _z > predicate. > * config/arm/arm_mve.h (vstrbq): Remove. > (vstrbq_p): Likewise. > (vstrhq): Likewise. > (vstrhq_p): Likewise. > (vstrwq): Likewise. > (vstrwq_p): Likewise. > (vst1q_p): Likewise. > (vld1q_z): Likewise. > (vldrbq_s8): Likewise. > (vldrbq_u8): Likewise. > (vldrbq_s16): Likewise. > (vldrbq_u16): Likewise. > (vldrbq_s32): Likewise. > (vldrbq_u32): Likewise. > (vstrbq_s8): Likewise. > (vstrbq_s32): Likewise. > (vstrbq_s16): Likewise. > (vstrbq_u8): Likewise. > (vstrbq_u32): Likewise. > (vstrbq_u16): Likewise. > (vstrbq_p_s8): Likewise. > (vstrbq_p_s32): Likewise. > (vstrbq_p_s16): Likewise. > (vstrbq_p_u8): Likewise. > (vstrbq_p_u32): Likewise. > (vstrbq_p_u16): Likewise. > (vldrbq_z_s16): Likewise. > (vldrbq_z_u8): Likewise. > (vldrbq_z_s8): Likewise. > (vldrbq_z_s32): Likewise. > (vldrbq_z_u16): Likewise. > (vldrbq_z_u32): Likewise. > (vldrhq_s32): Likewise. > (vldrhq_s16): Likewise. > (vldrhq_u32): Likewise. > (vldrhq_u16): Likewise. > (vldrhq_z_s32): Likewise. > (vldrhq_z_s16): Likewise. > (vldrhq_z_u32): Likewise. > (vldrhq_z_u16): Likewise. > (vldrwq_s32): Likewise. > (vldrwq_u32): Likewise. > (vldrwq_z_s32): Likewise. > (vldrwq_z_u32): Likewise. > (vldrhq_f16): Likewise. > (vldrhq_z_f16): Likewise. > (vldrwq_f32): Likewise. > (vldrwq_z_f32): Likewise. > (vstrhq_f16): Likewise. > (vstrhq_s32): Likewise. > (vstrhq_s16): Likewise. > (vstrhq_u32): Likewise. > (vstrhq_u16): Likewise. > (vstrhq_p_f16): Likewise. > (vstrhq_p_s32): Likewise. > (vstrhq_p_s16): Likewise. > (vstrhq_p_u32): Likewise. > (vstrhq_p_u16): Likewise. > (vstrwq_f32): Likewise. > (vstrwq_s32): Likewise. > (vstrwq_u32): Likewise. > (vstrwq_p_f32): Likewise. > (vstrwq_p_s32): Likewise. > (vstrwq_p_u32): Likewise. > (vst1q_p_u8): Likewise. > (vst1q_p_s8): Likewise. > (vld1q_z_u8): Likewise. > (vld1q_z_s8): Likewise. > (vst1q_p_u16): Likewise. > (vst1q_p_s16): Likewise. > (vld1q_z_u16): Likewise. > (vld1q_z_s16): Likewise. > (vst1q_p_u32): Likewise. > (vst1q_p_s32): Likewise. > (vld1q_z_u32): Likewise. > (vld1q_z_s32): Likewise. > (vld1q_z_f16): Likewise. > (vst1q_p_f16): Likewise. > (vld1q_z_f32): Likewise. > (vst1q_p_f32): Likewise. > (__arm_vstrbq_s8): Likewise. > (__arm_vstrbq_s32): Likewise. > (__arm_vstrbq_s16): Likewise. > (__arm_vstrbq_u8): Likewise. > (__arm_vstrbq_u32): Likewise. > (__arm_vstrbq_u16): Likewise. > (__arm_vldrbq_s8): Likewise. > (__arm_vldrbq_u8): Likewise. > (__arm_vldrbq_s16): Likewise. > (__arm_vldrbq_u16): Likewise. > (__arm_vldrbq_s32): Likewise. > (__arm_vldrbq_u32): Likewise. > (__arm_vstrbq_p_s8): Likewise. > (__arm_vstrbq_p_s32): Likewise. > (__arm_vstrbq_p_s16): Likewise. > (__arm_vstrbq_p_u8): Likewise. > (__arm_vstrbq_p_u32): Likewise. > (__arm_vstrbq_p_u16): Likewise. > (__arm_vldrbq_z_s8): Likewise. > (__arm_vldrbq_z_s32): Likewise. > (__arm_vldrbq_z_s16): Likewise. > (__arm_vldrbq_z_u8): Likewise. > (__arm_vldrbq_z_u32): Likewise. > (__arm_vldrbq_z_u16): Likewise. > (__arm_vldrhq_s32): Likewise. > (__arm_vldrhq_s16): Likewise. > (__arm_vldrhq_u32): Likewise. > (__arm_vldrhq_u16): Likewise. > (__arm_vldrhq_z_s32): Likewise. > (__arm_vldrhq_z_s16): Likewise. > (__arm_vldrhq_z_u32): Likewise. > (__arm_vldrhq_z_u16): Likewise. > (__arm_vldrwq_s32): Likewise. > (__arm_vldrwq_u32): Likewise. > (__arm_vldrwq_z_s32): Likewise. > (__arm_vldrwq_z_u32): Likewise. > (__arm_vstrhq_s32): Likewise. > (__arm_vstrhq_s16): Likewise. > (__arm_vstrhq_u32): Likewise. > (__arm_vstrhq_u16): Likewise. > (__arm_vstrhq_p_s32): Likewise. > (__arm_vstrhq_p_s16): Likewise. > (__arm_vstrhq_p_u32): Likewise. > (__arm_vstrhq_p_u16): Likewise. > (__arm_vstrwq_s32): Likewise. > (__arm_vstrwq_u32): Likewise. > (__arm_vstrwq_p_s32): Likewise. > (__arm_vstrwq_p_u32): Likewise. > (__arm_vst1q_p_u8): Likewise. > (__arm_vst1q_p_s8): Likewise. > (__arm_vld1q_z_u8): Likewise. > (__arm_vld1q_z_s8): Likewise. > (__arm_vst1q_p_u16): Likewise. > (__arm_vst1q_p_s16): Likewise. > (__arm_vld1q_z_u16): Likewise. > (__arm_vld1q_z_s16): Likewise. > (__arm_vst1q_p_u32): Likewise. > (__arm_vst1q_p_s32): Likewise. > (__arm_vld1q_z_u32): Likewise. > (__arm_vld1q_z_s32): Likewise. > (__arm_vldrwq_f32): Likewise. > (__arm_vldrwq_z_f32): Likewise. > (__arm_vldrhq_z_f16): Likewise. > (__arm_vldrhq_f16): Likewise. > (__arm_vstrwq_p_f32): Likewise. > (__arm_vstrwq_f32): Likewise. > (__arm_vstrhq_f16): Likewise. > (__arm_vstrhq_p_f16): Likewise. > (__arm_vld1q_z_f16): Likewise. > (__arm_vst1q_p_f16): Likewise. > (__arm_vld1q_z_f32): Likewise. > (__arm_vst2q_f32): Likewise. > (__arm_vst1q_p_f32): Likewise. > (__arm_vstrbq): Likewise. > (__arm_vstrbq_p): Likewise. > (__arm_vstrhq): Likewise. > (__arm_vstrhq_p): Likewise. > (__arm_vstrwq): Likewise. > (__arm_vstrwq_p): Likewise. > (__arm_vst1q_p): Likewise. > (__arm_vld1q_z): Likewise. > * config/arm/arm_mve_builtins.def: > (vstrbq_s): Delete. > (vstrbq_u): Likewise. > (vldrbq_s): Likewise. > (vldrbq_u): Likewise. > (vstrbq_p_s): Likewise. > (vstrbq_p_u): Likewise. > (vldrbq_z_s): Likewise. > (vldrbq_z_u): Likewise. > (vld1q_u): Likewise. > (vld1q_s): Likewise. > (vldrhq_z_u): Likewise. > (vldrhq_u): Likewise. > (vldrhq_z_s): Likewise. > (vldrhq_s): Likewise. > (vld1q_f): Likewise. > (vldrhq_f): Likewise. > (vldrhq_z_f): Likewise. > (vldrwq_f): Likewise. > (vldrwq_s): Likewise. > (vldrwq_u): Likewise. > (vldrwq_z_f): Likewise. > (vldrwq_z_s): Likewise. > (vldrwq_z_u): Likewise. > (vst1q_u): Likewise. > (vst1q_s): Likewise. > (vstrhq_p_u): Likewise. > (vstrhq_u): Likewise. > (vstrhq_p_s): Likewise. > (vstrhq_s): Likewise. > (vst1q_f): Likewise. > (vstrhq_f): Likewise. > (vstrhq_p_f): Likewise. > (vstrwq_f): Likewise. > (vstrwq_s): Likewise. > (vstrwq_u): Likewise. > (vstrwq_p_f): Likewise. > (vstrwq_p_s): Likewise. > (vstrwq_p_u): Likewise. > * config/arm/iterators.md (MVE_w_narrow_TYPE): New iterator. > (MVE_w_narrow_type): New iterator. > (MVE_wide_n_TYPE): New attribute. > (MVE_wide_n_type): New attribute. > (MVE_wide_n_sz_elem): New attribute. > (MVE_wide_n_VPRED): New attribute. > (MVE_elem_ch): New attribute. > (supf): Remove VSTRBQ_S, VSTRBQ_U, VLDRBQ_S, VLDRBQ_U, VLD1Q_S, > VLD1Q_U, VLDRHQ_S, VLDRHQ_U, VLDRWQ_S, VLDRWQ_U, VST1Q_S, VST1Q_U, > VSTRHQ_S, VSTRHQ_U, VSTRWQ_S, VSTRWQ_U. > (VSTRBQ, VLDRBQ, VLD1Q, VLDRHQ, VLDRWQ, VST1Q, VSTRHQ, VSTRWQ): > Delete. > * config/arm/mve.md (mve_vstrbq_<supf><mode>): Remove. > (mve_vldrbq_<supf><mode>): Likewise. > (mve_vstrbq_p_<supf><mode>): Likewise. > (mve_vldrbq_z_<supf><mode>): Likewise. > (mve_vldrhq_fv8hf): Likewise. > (mve_vldrhq_<supf><mode>): Likewise. > (mve_vldrhq_z_fv8hf): Likewise. > (mve_vldrhq_z_<supf><mode>): Likewise. > (mve_vldrwq_fv4sf): Likewise. > (mve_vldrwq_<supf>v4si): Likewise. > (mve_vldrwq_z_fv4sf): Likewise. > (mve_vldrwq_z_<supf>v4si): Likewise. > (@mve_vld1q_f<mode>): Likewise. > (@mve_vld1q_<supf><mode>): Likewise. > (mve_vstrhq_fv8hf): Likewise. > (mve_vstrhq_p_fv8hf): Likewise. > (mve_vstrhq_p_<supf><mode>): Likewise. > (mve_vstrhq_<supf><mode>): Likewise. > (mve_vstrwq_fv4sf): Likewise. > (mve_vstrwq_p_fv4sf): Likewise. > (mve_vstrwq_p_<supf>v4si): Likewise. > (mve_vstrwq_<supf>v4si): Likewise. > (@mve_vst1q_f<mode>): Likewise. > (@mve_vst1q_<supf><mode>): Likewise. > (@mve_vstrq_<mode>): New. > (@mve_vstrq_p_<mode>): New. > (@mve_vstrq_truncate_<mode>): New. > (@mve_vstrq_p_truncate_<mode>): New. > (@mve_vldrq_<mode>): New. > (@mve_vldrq_z_<mode>): New. > (@mve_vldrq_extend_<mode><US>): New. > (@mve_vldrq_z_extend_<mode><US>): New. > * config/arm/unspecs.md: > (VSTRBQ_S): Remove. > (VSTRBQ_U): Likewise. > (VLDRBQ_S): Likewise. > (VLDRBQ_U): Likewise. > (VLD1Q_F): Likewise. > (VLD1Q_S): Likewise. > (VLD1Q_U): Likewise. > (VLDRHQ_F): Likewise. > (VLDRHQ_U): Likewise. > (VLDRHQ_S): Likewise. > (VLDRWQ_F): Likewise. > (VLDRWQ_S): Likewise. > (VLDRWQ_U): Likewise. > (VSTRHQ_F): Likewise. > (VST1Q_S): Likewise. > (VST1Q_U): Likewise. > (VSTRHQ_U): Likewise. > (VSTRWQ_S): Likewise. > (VSTRWQ_U): Likewise. > (VSTRWQ_F): Likewise. > (VST1Q_F): Likewise. > (VLDRQ): New. > (VLDRQ_Z): Likewise. > (VLDRQ_EXT): Likewise. > (VLDRQ_EXT_Z): Likewise. > (VSTRQ): Likewise. > (VSTRQ_P): Likewise. > (VSTRQ_TRUNC): Likewise. > (VSTRQ_TRUNC_P): Likewise. > > gcc/testsuite/ > * gcc.target/arm/pr112337.c: Call intrinsic instead of builtin. > --- > gcc/config/arm/arm-mve-builtins-base.cc | 134 ++- > gcc/config/arm/arm-mve-builtins-base.def | 20 +- > gcc/config/arm/arm-mve-builtins-base.h | 6 + > gcc/config/arm/arm-mve-builtins-functions.h | 13 - > gcc/config/arm/arm-mve-builtins.cc | 15 +- > gcc/config/arm/arm_mve.h | 1010 +------------------ > gcc/config/arm/arm_mve_builtins.def | 38 - > gcc/config/arm/iterators.md | 37 +- > gcc/config/arm/mve.md | 649 ++++-------- > gcc/config/arm/unspecs.md | 29 +- > gcc/testsuite/gcc.target/arm/pr112337.c | 4 +- > 11 files changed, 373 insertions(+), 1582 deletions(-) >
> diff --git a/gcc/config/arm/mve.md b/gcc/config/arm/mve.md > index 8c69670b161..a75b4a15dc0 100644 > --- a/gcc/config/arm/mve.md > +++ b/gcc/config/arm/mve.md [...] > @@ -6368,6 +6098,7 @@ (define_expand "@arm_mve_reinterpret<mode>" > } > ) > > + > ;; Originally expanded by 'predicated_doloop_end'. > ;; In the rare situation where the branch is too far, we do also need to > ;; revert FPSCR.LTPSIZE back to 0x100 after the last iteration. You forgot to remove this. Otherwise this, and the rest of the series are OK. R.