Hi Ajit,
I have taken a quick look at the patch and my comments are inlined:

On 09/01/24 4:44 pm, Ajit Agarwal wrote:
> Hello All:
> 
> This pass is registered before ira rtl pass.
> Bootstrapped and regtested for powerpc64-linux-gnu.
> 
> No regressions for spec 2017 benchmarks and improvements for some of the
> FP and INT benchmarks.
> 
> Vladimir:
> 
> I did modify IRA and LRA register Allocators. Please review.
> 
> Thanks & Regards
> Ajit
> 
> rs6000: New pass for replacement of adjacent lxv with lxvp.

Please add PR number.

> 
> New pass to replace adjacent memory addresses lxv with lxvp.
> This pass is registered before ira rtl pass.

Please add explanation of what changes have been made in IRA/LRA
and why those changes are required.

> 
> 2024-01-09  Ajit Kumar Agarwal  <aagar...@linux.ibm.com>
> 


> diff --git a/gcc/config.gcc b/gcc/config.gcc
> index f0676c830e8..4cf15e807de 100644
> --- a/gcc/config.gcc
> +++ b/gcc/config.gcc
> @@ -518,7 +518,7 @@ or1k*-*-*)
>       ;;
>  powerpc*-*-*)
>       cpu_type=rs6000
> -     extra_objs="rs6000-string.o rs6000-p8swap.o rs6000-logue.o"
> +     extra_objs="rs6000-string.o rs6000-p8swap.o rs6000-logue.o 
> rs6000-vecload-opt.o"
>       extra_objs="${extra_objs} rs6000-call.o rs6000-pcrel-opt.o"
>       extra_objs="${extra_objs} rs6000-builtins.o rs6000-builtin.o"
>       extra_headers="ppc-asm.h altivec.h htmintrin.h htmxlintrin.h"
> @@ -555,7 +555,7 @@ riscv*)
>       ;;
>  rs6000*-*-*)
>       extra_options="${extra_options} g.opt fused-madd.opt 
> rs6000/rs6000-tables.opt"
> -     extra_objs="rs6000-string.o rs6000-p8swap.o rs6000-logue.o"
> +     extra_objs="rs6000-string.o rs6000-p8swap.o rs6000-logue.o 
> rs6000-vecload-opt.o"
>       extra_objs="${extra_objs} rs6000-call.o rs6000-pcrel-opt.o"
>       target_gtfiles="$target_gtfiles 
> \$(srcdir)/config/rs6000/rs6000-logue.cc 
> \$(srcdir)/config/rs6000/rs6000-call.cc"
>       target_gtfiles="$target_gtfiles 
> \$(srcdir)/config/rs6000/rs6000-pcrel-opt.cc"
> diff --git a/gcc/config/rs6000/rs6000-passes.def 
> b/gcc/config/rs6000/rs6000-passes.def
> index ca899d5f7af..e6a9810ee24 100644
> --- a/gcc/config/rs6000/rs6000-passes.def
> +++ b/gcc/config/rs6000/rs6000-passes.def
> @@ -28,6 +28,7 @@ along with GCC; see the file COPYING3.  If not see
>       The power8 does not have instructions that automaticaly do the byte 
> swaps
>       for loads and stores.  */
>    INSERT_PASS_BEFORE (pass_cse, 1, pass_analyze_swaps);
> +  INSERT_PASS_BEFORE (pass_ira, 1, pass_analyze_vecload);

Please add comments, similar to the other INSERT_PASS_BEFORE(...).

>  
>    /* Pass to do the PCREL_OPT optimization that combines the load of an
>       external symbol's address along with a single load or store using that
> diff --git a/gcc/config/rs6000/rs6000-vecload-opt.cc 
> b/gcc/config/rs6000/rs6000-vecload-opt.cc
> new file mode 100644
> index 00000000000..f02c8337f2e
> --- /dev/null
> +++ b/gcc/config/rs6000/rs6000-vecload-opt.cc
> @@ -0,0 +1,395 @@
> +/* Subroutines used to replace lxv with lxvp
> +   for TARGET_POWER10 and TARGET_VSX,

s/,/.

Comment can be rewritten as follows to specify the fact that we replace
lxv's having adjacent addresses:
Subroutines used to replace lxv having adjacent addresses with lxvp.


> +/* Identify lxv instruction that are candidate of adjacent
> +   memory addresses and replace them with mma instruction lxvp.  */

The comment needs modification for better readability, perhaps as follows:
Identify lxv instructions that have adjacent memory addresses 
and replace them with an lxvp instruction.

> +unsigned int
> +rs6000_analyze_vecload (function *fun)
> +{
> +  df_set_flags (DF_RD_PRUNE_DEAD_DEFS);
> +  df_chain_add_problem (DF_DU_CHAIN | DF_UD_CHAIN);
> +  df_analyze ();
> +  df_set_flags (DF_DEFER_INSN_RESCAN);
> +
> +  /* Rebuild ud- and du-chains.  */
> +  df_remove_problem (df_chain);
> +  df_process_deferred_rescans ();
> +  df_set_flags (DF_RD_PRUNE_DEAD_DEFS);
> +  df_chain_add_problem (DF_DU_CHAIN | DF_UD_CHAIN);
> +  df_analyze ();
> +  df_set_flags (DF_DEFER_INSN_RESCAN);
> +
> +  basic_block bb;
> +  bool changed = false;
> +  rtx_insn *insn, *curr_insn = 0;
> +  rtx_insn *insn1 = 0, *insn2 = 0;
> +  bool first_vec_insn = false;
> +  unsigned int regno = 0;
> +
> +  FOR_ALL_BB_FN (bb, fun)

I am assuming that the 2 lxv instructions that we are searching for
should belong to the same BB. Otherwise, we risk moving a load insn across
basic blocks. In which case, the variable "first_vec_insn" has to be set to 
false here. It has to be false each time we start processing a new BB.

> +    FOR_BB_INSNS_SAFE (bb, insn, curr_insn)
> +    {
> +      if (LABEL_P (insn))
> +     continue;
> +
> +      if (NONDEBUG_INSN_P (insn) && GET_CODE (PATTERN (insn)) == SET)
> +     {

Please correct the indentation.

> +       rtx set = single_set (insn);
> +       rtx src = SET_SRC (set);
> +       machine_mode mode = GET_MODE (SET_DEST (set));
> +
> +       if (TARGET_VSX && TARGET_POWER10 && MEM_P (src))

Since this function gets called only if TARGET_VSX and TARGET_POWER10 are true,
do we need the check here again?

> +         {
> +           if (mem_operand_ds_form (src, mode)
> +               || (mode_supports_dq_form (mode)
> +               && quad_address_p (XEXP (src, 0), mode, false)))

Please correct the indentation.

> +             {
> +               if (first_vec_insn)
> +                 {
> +                   first_vec_insn = false;

first_vec_insn should be set to false only after the replacement of
the lxv instructions with the lxvp. For example, say if the second lxv 
instruction
does not have adjacent memory location wrt to the first lxv, then we
should continue to search the BB for an lxv instruction with adjacent memory
address.

> +                   rtx addr = XEXP (src, 0);
> +                   insn2 = insn;
> +                   rtx insn1_src = SET_SRC (PATTERN (insn1));
> +
> +                   if (adjacent_mem_locations (insn1_src, src) == insn1_src)
> +                     {
> +                       rtx op0 = XEXP (addr, 0);
> +
> +                       if (regno == REGNO (op0))
> +                         changed = replace_lxv_with_lxvp (insn1, insn2);> +  
>                 }
> +                  }

Incorrect indentation.

> +
> +                 if (REG_P (XEXP (src, 0))

Incorrect indentation.

> +                     && GET_CODE (XEXP (src, 0)) != PLUS)
> +                   {
> +                     regno = REGNO (XEXP (src,0));
> +                     first_vec_insn = true;
> +                     insn1 = insn;
> +                   }
> +               }
> +           }
> +       }
> +     }
> +
> +  return changed;
> +}
> +
> +const pass_data pass_data_analyze_vecload =
> +{
> +  RTL_PASS, /* type */
> +  "vecload", /* name */
> +  OPTGROUP_NONE, /* optinfo_flags */
> +  TV_NONE, /* tv_id */
> +  0, /* properties_required */
> +  0, /* properties_provided */
> +  0, /* properties_destroyed */
> +  0, /* todo_flags_start */
> +  TODO_df_finish, /* todo_flags_finish */
> +};
> +
> +class pass_analyze_vecload : public rtl_opt_pass
> +{
> +public:
> +  pass_analyze_vecload(gcc::context *ctxt)
> +    : rtl_opt_pass(pass_data_analyze_vecload, ctxt)
> +  {}
> +
> +  /* opt_pass methods: */
> +  virtual bool gate (function *)
> +    {
> +      return (optimize > 0 && TARGET_VSX && TARGET_POWER10);
> +    }
> +
> +  virtual unsigned int execute (function *fun)
> +    {
> +      return rs6000_analyze_vecload (fun);
> +    }
> +}; // class pass_analyze_vecload
> +
> +rtl_opt_pass *
> +make_pass_analyze_vecload (gcc::context *ctxt)
> +{
> +  return new pass_analyze_vecload (ctxt);
> +}
> +
> diff --git a/gcc/ira-build.cc b/gcc/ira-build.cc
> index c715a834f12..fe78d967e75 100644
> --- a/gcc/ira-build.cc
> +++ b/gcc/ira-build.cc
> @@ -1862,7 +1862,7 @@ create_insn_allocnos (rtx x, rtx outer, bool output_p)
>           }
>  
>         ALLOCNO_NREFS (a)++;
> -       ALLOCNO_FREQ (a) += REG_FREQ_FROM_BB (curr_bb);
> +       ALLOCNO_FREQ (a) += REG_FREQ (regno);

Can you please explain why this change is required?

>         if (output_p)
>           bitmap_set_bit (ira_curr_loop_tree_node->modified_regnos, regno);
>       }

> diff --git a/gcc/ira-color.cc b/gcc/ira-color.cc
> index 214a4f16d3c..d5f6f885957 100644
> --- a/gcc/ira-color.cc
> +++ b/gcc/ira-color.cc
> @@ -1047,6 +1047,8 @@ setup_profitable_hard_regs (void)
>       continue;
>        data = ALLOCNO_COLOR_DATA (a);
>        if (ALLOCNO_UPDATED_HARD_REG_COSTS (a) == NULL
> +       && ALLOCNO_CLASS_COST (a) > 0
> +       && ALLOCNO_MEMORY_COST (a) > 0 

Why do we have these checks for positive cost?

>         && ALLOCNO_CLASS_COST (a) > ALLOCNO_MEMORY_COST (a)
>         /* Do not empty profitable regs for static chain pointer
>            pseudo when non-local goto is used.  */
> @@ -1131,6 +1133,8 @@ setup_profitable_hard_regs (void)
>                                      hard_regno))
>               continue;
>             if (ALLOCNO_UPDATED_MEMORY_COST (a) < costs[j]
> +               && ALLOCNO_UPDATED_MEMORY_COST (a) > 0
> +               && costs[j] > 0

Why do we have these checks for positive cost?
Please note that costs can be negative.

>                 /* Do not remove HARD_REGNO for static chain pointer
>                    pseudo when non-local goto is used.  */
>                 && ! non_spilled_static_chain_regno_p (ALLOCNO_REGNO (a)))
> diff --git a/gcc/lra-assigns.cc b/gcc/lra-assigns.cc
> index 7aa210e986f..46ab3b5f165 100644
> --- a/gcc/lra-assigns.cc
> +++ b/gcc/lra-assigns.cc
> @@ -1638,6 +1737,7 @@ lra_assign (bool &fails_p)
>    bitmap_initialize (&all_spilled_pseudos, &reg_obstack);
>    create_live_range_start_chains ();
>    setup_live_pseudos_and_spill_after_risky_transforms (&all_spilled_pseudos);
> +#if 0

Please remove the code instead of enclosing it in #if 0.

>    if (! lra_hard_reg_split_p && ! lra_asm_error_p && flag_checking)
>      /* Check correctness of allocation but only when there are no hard reg
>         splits and asm errors as in the case of errors explicit insns 
> involving
> @@ -1649,6 +1749,7 @@ lra_assign (bool &fails_p)
>         && overlaps_hard_reg_set_p (lra_reg_info[i].conflict_hard_regs,
>                                     PSEUDO_REGNO_MODE (i), reg_renumber[i]))
>       gcc_unreachable ();
> +#endif
>    /* Setup insns to process on the next constraint pass.  */
>    bitmap_initialize (&changed_pseudo_bitmap, &reg_obstack);
>    init_live_reload_and_inheritance_pseudos ();
> diff --git a/gcc/lra-int.h b/gcc/lra-int.h
> index 5cdf92be7fc..962fb351ba0 100644
> --- a/gcc/lra-int.h
> +++ b/gcc/lra-int.h
> @@ -95,6 +95,7 @@ public:
>       *non-debug* insns.       */
>    int nrefs, freq;
>    int last_reload;
> +  bool pseudo_conflict;

Please add some comments.

>    /* rtx used to undo the inheritance.  It can be non-null only
>       between subsequent inheritance and undo inheritance passes.  */
>    rtx restore_rtx;
> diff --git a/gcc/lra.cc b/gcc/lra.cc
> index 69081a8e025..5cc97ce7506 100644
> --- a/gcc/lra.cc
> +++ b/gcc/lra.cc
> @@ -1359,6 +1359,7 @@ initialize_lra_reg_info_element (int i)
>    lra_reg_info[i].nrefs = lra_reg_info[i].freq = 0;
>    lra_reg_info[i].last_reload = 0;
>    lra_reg_info[i].restore_rtx = NULL_RTX;
> +  lra_reg_info[i].pseudo_conflict = false;
>    lra_reg_info[i].val = get_new_reg_value ();
>    lra_reg_info[i].offset = 0;
>    lra_reg_info[i].copies = NULL;
> diff --git a/gcc/testsuite/g++.target/powerpc/vecload.C 
> b/gcc/testsuite/g++.target/powerpc/vecload.C
> new file mode 100644
> index 00000000000..0d998aa7054
> --- /dev/null
> +++ b/gcc/testsuite/g++.target/powerpc/vecload.C
> @@ -0,0 +1,15 @@
> +/* { dg-do compile } */ 
> +/* { dg-require-effective-target powerpc_p9vector_ok } */

This should be "power10_ok" and not "powerpc_p9vector_ok". Same comment for 
other tests.

Regards,
Surya

Reply via email to