Some comments below, all quite minor. I'll continue to review tomorrow,
I need a fresher brain for arm_mve_check_df_chain_back_for_implic_predic
;)
+static int
+arm_mve_get_vctp_lanes (rtx x)
+{
+ if (GET_CODE (x) == SET && GET_CODE (XEXP (x, 1)) == UNSPEC
+ && (XINT (XEXP (x, 1), 1) == VCTP || XINT (XEXP (x, 1), 1) ==
VCTP_M))
+ {
+ switch (GET_MODE (XEXP (x, 1)))
+ {
+ case V16BImode:
+ return 16;
+ case V8BImode:
+ return 8;
+ case V4BImode:
+ return 4;
+ case V2QImode:
+ return 2;
+ default:
+ break;
+ }
+ }
+ return 0;
+}
I think you can replace the switch with something along the lines of:
machine_mode mode = GET_MODE (XEXP (x, 1));
return VECTOR_MODE_P (mode) ? GET_MODE_NUNITS (mode) : 0;
+/* Check if an insn requires the use of the VPR_REG, if it does, return the
+ sub-rtx of the VPR_REG. The `type` argument controls whether
+ this function should:
+ * For type == 0, check all operands, including the OUT operands,
+ and return the first occurance of the VPR_REG.
s/occurance/occurrence/
+ bool requires_vpr;
+ extract_constrain_insn (insn);
indent of requires_vpr is off.
+ if (type == 1 && (recog_data.operand_type[op] == OP_OUT
+ || recog_data.operand_type[op] == OP_INOUT))
+ continue;
+ else if (type == 2 && (recog_data.operand_type[op] == OP_IN
+ || recog_data.operand_type[op] == OP_INOUT))
+ continue;
Why skip INOUT? I guess this will become clear when I see the uses, but
I'm wondering whether 'only check the input operands.' is clear enough.
Maybe 'check operands that are input only.' would be more accurate?
+ /* Fetch the reg_class for each entry and check it against the
+ * VPR_REG reg_class. */
Remove leading * on the second line.
+
+/* Wrapper function of arm_get_required_vpr_reg with type == 1, so return
+ something only if the VPR reg is an input operand to the insn. */
When talking about a function parameter in comments capitalize (INSN)
the name. Same for:
+/* Wrapper function of arm_get_required_vpr_reg with type == 2, so return
+ something only if the VPR reg is the retrurn value, an output of, or is
+ clobbered by the insn. */
+/* Return true if an insn is an MVE instruction that VPT-predicable, but in
+ its unpredicated form, or if it is predicated, but on a predicate other
+ than vpr_reg. */
In this one also 'is a MVE instruction that is VPT-predicable' would be
better I think.
On 15/06/2023 12:47, Stamatis Markianos-Wright via Gcc-patches wrote:
> Hi all,
>
> This is the 2/2 patch that contains the functional changes needed
> for MVE Tail Predicated Low Overhead Loops. See my previous email
> for a general introduction of MVE LOLs.
>
> This support is added through the already existing loop-doloop
> mechanisms that are used for non-MVE dls/le looping.
>
> Mid-end changes are:
>
> 1) Relax the loop-doloop mechanism in the mid-end to allow for
> decrement numbers other that -1 and for `count` to be an
> rtx containing a simple REG (which in this case will contain
> the number of elements to be processed), rather
> than an expression for calculating the number of iterations.
> 2) Added a new df utility function: `df_bb_regno_only_def_find` that
> will return the DEF of a REG only if it is DEF-ed once within the
> basic block.
>
> And many things in the backend to implement the above optimisation:
>
> 3) Implement the `arm_predict_doloop_p` target hook to instruct the
> mid-end about Low Overhead Loops (MVE or not), as well as
> `arm_loop_unroll_adjust` which will prevent unrolling of any
loops
> that are valid for becoming MVE Tail_Predicated Low Overhead
Loops
> (unrolling can transform a loop in ways that invalidate the
dlstp/
> letp tranformation logic and the benefit of the dlstp/letp loop
> would be considerably higher than that of unrolling)
> 4) Appropriate changes to the define_expand of doloop_end, new
> patterns for dlstp and letp, new iterators, unspecs, etc.
> 5) `arm_mve_loop_valid_for_dlstp` and a number of checking
functions:
> * `arm_mve_dlstp_check_dec_counter`
> * `arm_mve_dlstp_check_inc_counter`
> * `arm_mve_check_reg_origin_is_num_elems`
> * `arm_mve_check_df_chain_back_for_implic_predic`
> * `arm_mve_check_df_chain_fwd_for_implic_predic_impact`
> This all, in smoe way or another, are running checks on the loop
> structure in order to determine if the loop is valid for
dlstp/letp
> transformation.
> 6) `arm_attempt_dlstp_transform`: (called from the define_expand of
> doloop_end) this function re-checks for the loop's
suitability for
> dlstp/letp transformation and then implements it, if possible.
> 7) Various utility functions:
> *`arm_mve_get_vctp_lanes` to map
> from vctp unspecs to number of lanes, and
> `arm_get_required_vpr_reg`
> to check an insn to see if it requires the VPR or not.
> * `arm_mve_get_loop_vctp`
> * `arm_mve_get_vctp_lanes`
> * `arm_emit_mve_unpredicated_insn_to_seq`
> * `arm_get_required_vpr_reg`
> * `arm_get_required_vpr_reg_param`
> * `arm_get_required_vpr_reg_ret_val`
> * `arm_mve_vec_insn_is_predicated_with_this_predicate`
> * `arm_mve_vec_insn_is_unpredicated_or_uses_other_predicate`
>
> No regressions on arm-none-eabi with various targets and on
> aarch64-none-elf. Thoughts on getting this into trunk?
>
> Thank you,
> Stam Markianos-Wright
>
> gcc/ChangeLog:
>
> * config/arm/arm-protos.h (arm_target_insn_ok_for_lob):
> Rename to...
> (arm_target_bb_ok_for_lob): ...this
> (arm_attempt_dlstp_transform): New.
> * config/arm/arm.cc (TARGET_LOOP_UNROLL_ADJUST): New.
> (TARGET_PREDICT_DOLOOP_P): New.
> (arm_block_set_vect):
> (arm_target_insn_ok_for_lob): Rename from
> arm_target_insn_ok_for_lob.
> (arm_target_bb_ok_for_lob): New.
> (arm_mve_get_vctp_lanes): New.
> (arm_get_required_vpr_reg): New.
> (arm_get_required_vpr_reg_param): New.
> (arm_get_required_vpr_reg_ret_val): New.
> (arm_mve_get_loop_vctp): New.
> (arm_mve_vec_insn_is_unpredicated_or_uses_other_predicate): New.
> (arm_mve_vec_insn_is_predicated_with_this_predicate): New.
> (arm_mve_check_df_chain_back_for_implic_predic): New.
> (arm_mve_check_df_chain_fwd_for_implic_predic_impact): New.
> (arm_mve_check_reg_origin_is_num_elems): New.
> (arm_mve_dlstp_check_inc_counter): New.
> (arm_mve_dlstp_check_dec_counter): New.
> (arm_mve_loop_valid_for_dlstp): New.
> (arm_predict_doloop_p): New.
> (arm_loop_unroll_adjust): New.
> (arm_emit_mve_unpredicated_insn_to_seq): New.
> (arm_attempt_dlstp_transform): New.
> * config/arm/iterators.md (DLSTP): New.
> (mode1): Add DLSTP mappings.
> * config/arm/mve.md (*predicated_doloop_end_internal): New.
> (dlstp<mode1>_insn): New.
> * config/arm/thumb2.md (doloop_end): Update for MVE LOLs.
> * config/arm/unspecs.md: New unspecs.
> * df-core.cc (df_bb_regno_only_def_find): New.
> * df.h (df_bb_regno_only_def_find): New.
> * loop-doloop.cc (doloop_condition_get): Relax conditions.
> (doloop_optimize): Add support for elementwise LoLs.
>
> gcc/testsuite/ChangeLog:
>
> * gcc.target/arm/lob.h: Update framework.
> * gcc.target/arm/lob1.c: Likewise.
> * gcc.target/arm/lob6.c: Likewise.
> * gcc.target/arm/mve/dlstp-compile-asm.c: New test.
> * gcc.target/arm/mve/dlstp-int16x8.c: New test.
> * gcc.target/arm/mve/dlstp-int32x4.c: New test.
> * gcc.target/arm/mve/dlstp-int64x2.c: New test.
> * gcc.target/arm/mve/dlstp-int8x16.c: New test.
> * gcc.target/arm/mve/dlstp-invalid-asm.c: New test.