Thank you Andre for reviewing! I'll attach the updated version of the patch to the third review email (your final one thus far ;)

On 22/06/2023 16:54, Andre Vieira (lists) wrote:
Some comments below, all quite minor. I'll continue to review tomorrow, I need a fresher brain for arm_mve_check_df_chain_back_for_implic_predic  ;)

+static int
+arm_mve_get_vctp_lanes (rtx x)
+{
+  if (GET_CODE (x) == SET && GET_CODE (XEXP (x, 1)) == UNSPEC
+      && (XINT (XEXP (x, 1), 1) == VCTP || XINT (XEXP (x, 1), 1) == VCTP_M))
+    {
+      switch (GET_MODE (XEXP (x, 1)))
+    {
+      case V16BImode:
+        return 16;
+      case V8BImode:
+        return 8;
+      case V4BImode:
+        return 4;
+      case V2QImode:
+        return 2;
+      default:
+        break;
+    }
+    }
+  return 0;
+}

I think you can replace the switch with something along the lines of:
machine_mode mode = GET_MODE (XEXP (x, 1));
return VECTOR_MODE_P (mode) ? GET_MODE_NUNITS (mode) : 0;

Ah true, especially now that there are no HImode predicates!

I added an additional check of `&& VALID_MVE_PRED_MODE (mode)` as well, just to make sure we could never pick up V4SImode, etc. (although I'd never expect that to happen if `rtx x` came from a valid instruction)



+/* Check if an insn requires the use of the VPR_REG, if it does, return the
+   sub-rtx of the VPR_REG.  The `type` argument controls whether
+   this function should:
+   * For type == 0, check all operands, including the OUT operands,
+     and return the first occurance of the VPR_REG.

s/occurance/occurrence/
Done

+      bool requires_vpr;
+  extract_constrain_insn (insn);

indent of requires_vpr is off.
Done

+      if (type == 1 && (recog_data.operand_type[op] == OP_OUT
+            || recog_data.operand_type[op] == OP_INOUT))
+    continue;
+      else if (type == 2 && (recog_data.operand_type[op] == OP_IN
+                 || recog_data.operand_type[op] == OP_INOUT))
+    continue;

Why skip INOUT? I guess this will become clear when I see the uses, but I'm wondering whether 'only check the input operands.' is clear enough. Maybe 'check operands that are input only.' would be more accurate?
Oh! Thanks for spotting this. It also doesn't work with my comment at the top:
`(INOUT operands are considered both as input and output operands)`

It's been a long time since I wrote this piece, but it might be that I added this after realising that there are no insns with an OP_INOUT VPR reg. Since I don't think it's functional, I changed the code to align with the comment, instead.


+      /* Fetch the reg_class for each entry and check it against the
+       * VPR_REG reg_class.  */

Remove leading * on the second line.
Damn auto-formatters ;)
Done

+
+/* Wrapper function of arm_get_required_vpr_reg with type == 1, so return
+   something only if the VPR reg is an input operand to the insn.  */

When talking about a function parameter in comments capitalize (INSN) the name. Same for:
Done

+/* Wrapper function of arm_get_required_vpr_reg with type == 2, so return +   something only if the VPR reg is the retrurn value, an output of, or is
+   clobbered by the insn.  */

+/* Return true if an insn is an MVE instruction that VPT-predicable, but in +   its unpredicated form, or if it is predicated, but on a predicate other
+   than vpr_reg.  */

In this one also 'is a MVE instruction that is VPT-predicable' would be better I think.
Oops, thanks for spotting. Done.


On 15/06/2023 12:47, Stamatis Markianos-Wright via Gcc-patches wrote:
>      Hi all,
>
>      This is the 2/2 patch that contains the functional changes needed
>      for MVE Tail Predicated Low Overhead Loops.  See my previous email
>      for a general introduction of MVE LOLs.
>
>      This support is added through the already existing loop-doloop
>      mechanisms that are used for non-MVE dls/le looping.
>
>      Mid-end changes are:
>
>      1) Relax the loop-doloop mechanism in the mid-end to allow for
>         decrement numbers other that -1 and for `count` to be an
>         rtx containing a simple REG (which in this case will contain
>         the number of elements to be processed), rather
>         than an expression for calculating the number of iterations.
>      2) Added a new df utility function: `df_bb_regno_only_def_find` that >         will return the DEF of a REG only if it is DEF-ed once within the
>         basic block.
>
>      And many things in the backend to implement the above optimisation:
>
>      3)  Implement the `arm_predict_doloop_p` target hook to instruct the
>          mid-end about Low Overhead Loops (MVE or not), as well as
>          `arm_loop_unroll_adjust` which will prevent unrolling of any loops >          that are valid for becoming MVE Tail_Predicated Low Overhead Loops >          (unrolling can transform a loop in ways that invalidate the dlstp/ >          letp tranformation logic and the benefit of the dlstp/letp loop
>          would be considerably higher than that of unrolling)
>      4)  Appropriate changes to the define_expand of doloop_end, new
>          patterns for dlstp and letp, new iterators, unspecs, etc.
>      5) `arm_mve_loop_valid_for_dlstp` and a number of checking functions:
>         * `arm_mve_dlstp_check_dec_counter`
>         * `arm_mve_dlstp_check_inc_counter`
>         * `arm_mve_check_reg_origin_is_num_elems`
>         * `arm_mve_check_df_chain_back_for_implic_predic`
>         * `arm_mve_check_df_chain_fwd_for_implic_predic_impact`
>         This all, in smoe way or another, are running checks on the loop >         structure in order to determine if the loop is valid for dlstp/letp
>         transformation.
>      6) `arm_attempt_dlstp_transform`: (called from the define_expand of >          doloop_end) this function re-checks for the loop's suitability for
>          dlstp/letp transformation and then implements it, if possible.
>      7) Various utility functions:
>         *`arm_mve_get_vctp_lanes` to map
>         from vctp unspecs to number of lanes, and
> `arm_get_required_vpr_reg`
>         to check an insn to see if it requires the VPR or not.
>         * `arm_mve_get_loop_vctp`
>         * `arm_mve_get_vctp_lanes`
>         * `arm_emit_mve_unpredicated_insn_to_seq`
>         * `arm_get_required_vpr_reg`
>         * `arm_get_required_vpr_reg_param`
>         * `arm_get_required_vpr_reg_ret_val`
>         * `arm_mve_vec_insn_is_predicated_with_this_predicate`
>         * `arm_mve_vec_insn_is_unpredicated_or_uses_other_predicate`
>
>      No regressions on arm-none-eabi with various targets and on
>      aarch64-none-elf. Thoughts on getting this into trunk?
>
>      Thank you,
>      Stam Markianos-Wright
>
>      gcc/ChangeLog:
>
>              * config/arm/arm-protos.h (arm_target_insn_ok_for_lob):
> Rename to...
>              (arm_target_bb_ok_for_lob): ...this
>              (arm_attempt_dlstp_transform): New.
>              * config/arm/arm.cc (TARGET_LOOP_UNROLL_ADJUST): New.
>              (TARGET_PREDICT_DOLOOP_P): New.
>              (arm_block_set_vect):
>              (arm_target_insn_ok_for_lob): Rename from
> arm_target_insn_ok_for_lob.
>              (arm_target_bb_ok_for_lob): New.
>              (arm_mve_get_vctp_lanes): New.
>              (arm_get_required_vpr_reg): New.
>              (arm_get_required_vpr_reg_param): New.
>              (arm_get_required_vpr_reg_ret_val): New.
>              (arm_mve_get_loop_vctp): New.
> (arm_mve_vec_insn_is_unpredicated_or_uses_other_predicate): New.
> (arm_mve_vec_insn_is_predicated_with_this_predicate): New.
>              (arm_mve_check_df_chain_back_for_implic_predic): New.
> (arm_mve_check_df_chain_fwd_for_implic_predic_impact): New.
>              (arm_mve_check_reg_origin_is_num_elems): New.
>              (arm_mve_dlstp_check_inc_counter): New.
>              (arm_mve_dlstp_check_dec_counter): New.
>              (arm_mve_loop_valid_for_dlstp): New.
>              (arm_predict_doloop_p): New.
>              (arm_loop_unroll_adjust): New.
>              (arm_emit_mve_unpredicated_insn_to_seq): New.
>              (arm_attempt_dlstp_transform): New.
>              * config/arm/iterators.md (DLSTP): New.
>              (mode1): Add DLSTP mappings.
>              * config/arm/mve.md (*predicated_doloop_end_internal): New.
>              (dlstp<mode1>_insn): New.
>              * config/arm/thumb2.md (doloop_end): Update for MVE LOLs.
>              * config/arm/unspecs.md: New unspecs.
>              * df-core.cc (df_bb_regno_only_def_find): New.
>              * df.h (df_bb_regno_only_def_find): New.
>              * loop-doloop.cc (doloop_condition_get): Relax conditions.
>              (doloop_optimize): Add support for elementwise LoLs.
>
>      gcc/testsuite/ChangeLog:
>
>              * gcc.target/arm/lob.h: Update framework.
>              * gcc.target/arm/lob1.c: Likewise.
>              * gcc.target/arm/lob6.c: Likewise.
>              * gcc.target/arm/mve/dlstp-compile-asm.c: New test.
>              * gcc.target/arm/mve/dlstp-int16x8.c: New test.
>              * gcc.target/arm/mve/dlstp-int32x4.c: New test.
>              * gcc.target/arm/mve/dlstp-int64x2.c: New test.
>              * gcc.target/arm/mve/dlstp-int8x16.c: New test.
>              * gcc.target/arm/mve/dlstp-invalid-asm.c: New test.

Reply via email to