Hi Stam, > -----Original Message----- > From: Stam Markianos-Wright <stam.markianos-wri...@arm.com> > Sent: Wednesday, September 6, 2023 6:19 PM > To: gcc-patches@gcc.gnu.org > Cc: Kyrylo Tkachov <kyrylo.tkac...@arm.com>; Richard Earnshaw > <richard.earns...@arm.com> > Subject: [PING][PATCH 2/2] arm: Add support for MVE Tail-Predicated Low > Overhead Loops > > Hi all, > > This is the 2/2 patch that contains the functional changes needed > for MVE Tail Predicated Low Overhead Loops. See my previous email > for a general introduction of MVE LOLs. > > This support is added through the already existing loop-doloop > mechanisms that are used for non-MVE dls/le looping. > > Mid-end changes are: > > 1) Relax the loop-doloop mechanism in the mid-end to allow for > decrement numbers other that -1 and for `count` to be an > rtx containing a simple REG (which in this case will contain > the number of elements to be processed), rather > than an expression for calculating the number of iterations. > 2) Added a new df utility function: `df_bb_regno_only_def_find` that > will return the DEF of a REG if it is DEF-ed only once within the > basic block. > > And many things in the backend to implement the above optimisation: > > 3) Implement the `arm_predict_doloop_p` target hook to instruct the > mid-end about Low Overhead Loops (MVE or not), as well as > `arm_loop_unroll_adjust` which will prevent unrolling of any loops > that are valid for becoming MVE Tail_Predicated Low Overhead Loops > (unrolling can transform a loop in ways that invalidate the dlstp/ > letp tranformation logic and the benefit of the dlstp/letp loop > would be considerably higher than that of unrolling) > 4) Appropriate changes to the define_expand of doloop_end, new > patterns for dlstp and letp, new iterators, unspecs, etc. > 5) `arm_mve_loop_valid_for_dlstp` and a number of checking functions: > * `arm_mve_dlstp_check_dec_counter` > * `arm_mve_dlstp_check_inc_counter` > * `arm_mve_check_reg_origin_is_num_elems` > * `arm_mve_check_df_chain_back_for_implic_predic` > * `arm_mve_check_df_chain_fwd_for_implic_predic_impact` > This all, in smoe way or another, are running checks on the loop > structure in order to determine if the loop is valid for dlstp/letp > transformation. > 6) `arm_attempt_dlstp_transform`: (called from the define_expand of > doloop_end) this function re-checks for the loop's suitability for > dlstp/letp transformation and then implements it, if possible. > 7) Various utility functions: > *`arm_mve_get_vctp_lanes` to map > from vctp unspecs to number of lanes, and `arm_get_required_vpr_reg` > to check an insn to see if it requires the VPR or not. > * `arm_mve_get_loop_vctp` > * `arm_mve_get_vctp_lanes` > * `arm_emit_mve_unpredicated_insn_to_seq` > * `arm_get_required_vpr_reg` > * `arm_get_required_vpr_reg_param` > * `arm_get_required_vpr_reg_ret_val` > * `arm_mve_is_across_vector_insn` > * `arm_is_mve_load_store_insn` > * `arm_mve_vec_insn_is_predicated_with_this_predicate` > * `arm_mve_vec_insn_is_unpredicated_or_uses_other_predicate` > > No regressions on arm-none-eabi with various targets and on > aarch64-none-elf. Thoughts on getting this into trunk?
The arm parts look sensible but we'd need review for the df-core.h and df-core.cc changes. Maybe Jeff can help or can recommend someone to take a look? Thanks, Kyrill > > Thank you, > Stam Markianos-Wright > > gcc/ChangeLog: > > * config/arm/arm-protos.h (arm_target_insn_ok_for_lob): Rename to... > (arm_target_bb_ok_for_lob): ...this > (arm_attempt_dlstp_transform): New. > * config/arm/arm.cc (TARGET_LOOP_UNROLL_ADJUST): New. > (TARGET_PREDICT_DOLOOP_P): New. > (arm_block_set_vect): > (arm_target_insn_ok_for_lob): Rename from arm_target_insn_ok_for_lob. > (arm_target_bb_ok_for_lob): New. > (arm_mve_get_vctp_lanes): New. > (arm_get_required_vpr_reg): New. > (arm_get_required_vpr_reg_param): New. > (arm_get_required_vpr_reg_ret_val): New. > (arm_mve_get_loop_vctp): New. > (arm_mve_vec_insn_is_unpredicated_or_uses_other_predicate): New. > (arm_mve_vec_insn_is_predicated_with_this_predicate): New. > (arm_mve_check_df_chain_back_for_implic_predic): New. > (arm_mve_check_df_chain_fwd_for_implic_predic_impact): New. > (arm_mve_check_reg_origin_is_num_elems): New. > (arm_mve_dlstp_check_inc_counter): New. > (arm_mve_dlstp_check_dec_counter): New. > (arm_mve_loop_valid_for_dlstp): New. > (arm_mve_is_across_vector_insn): New. > (arm_is_mve_load_store_insn): New. > (arm_predict_doloop_p): New. > (arm_loop_unroll_adjust): New. > (arm_emit_mve_unpredicated_insn_to_seq): New. > (arm_attempt_dlstp_transform): New. > * config/arm/iterators.md (DLSTP): New. > (mode1): Add DLSTP mappings. > * config/arm/mve.md (*predicated_doloop_end_internal): New. > (dlstp<mode1>_insn): New. > * config/arm/thumb2.md (doloop_end): Update for MVE LOLs. > * config/arm/unspecs.md: New unspecs. > * df-core.cc (df_bb_regno_only_def_find): New. > * df.h (df_bb_regno_only_def_find): New. > * loop-doloop.cc (doloop_condition_get): Relax conditions. > (doloop_optimize): Add support for elementwise LoLs. > > gcc/testsuite/ChangeLog: > > * gcc.target/arm/lob.h: Update framework. > * gcc.target/arm/lob1.c: Likewise. > * gcc.target/arm/lob6.c: Likewise. > * gcc.target/arm/mve/dlstp-compile-asm.c: New test. > * gcc.target/arm/mve/dlstp-int16x8.c: New test. > * gcc.target/arm/mve/dlstp-int32x4.c: New test. > * gcc.target/arm/mve/dlstp-int64x2.c: New test. > * gcc.target/arm/mve/dlstp-int8x16.c: New test. > * gcc.target/arm/mve/dlstp-invalid-asm.c: New test.