Re: [PATCH v5] Implement new RTL optimizations pass: fold-mem-offsets.

Manolis Tsamis Tue, 03 Oct 2023 04:50:43 -0700

On Fri, Sep 29, 2023 at 10:22 PM Jeff Law <jeffreya...@gmail.com> wrote:
>
>
>
> On 9/12/23 04:13, Manolis Tsamis wrote:
>
> >>> +
> >>> +/* Get the single reaching definition of an instruction inside a BB.
> >>> +   The definition is desired for REG used in INSN.
> >>> +   Return the definition insn or NULL if there's no definition with
> >>> +   the desired criteria.  */
> >>> +static rtx_insn*
> >>> +get_single_def_in_bb (rtx_insn *insn, rtx reg)
> >>> +{
> >>> +  df_ref use;
> >>> +  struct df_link *ref_chain, *ref_link;
> >>> +
> >>> +  FOR_EACH_INSN_USE (use, insn)
> >>> +    {
> >>> +      if (GET_CODE (DF_REF_REG (use)) == SUBREG)
> >>> +     return NULL;
> >>> +      if (REGNO (DF_REF_REG (use)) == REGNO (reg))
> >>> +     break;
> >>> +    }
> >>> +
> >>> +  if (!use)
> >>> +    return NULL;
> >>> +
> >>> +  ref_chain = DF_REF_CHAIN (use);
> >> So what if there's two uses of REG in INSN?  I don't think it's be
> >> common at all, but probably better safe and reject than sorry, right? Or
> >> is that case filtered out earlier?
> >>
> > If the REG is the same won't the definitions be the same even if that
> > REG appears multiple times in INSN?
> Yes.
Good, so no issues here.
>
> > fold_offsets_1 should be able to handle the folding with multiple uses
> > of REG just fine, for example add R1, R1 or add (ashift R1, 1), R1.
> > If there's no other issue here I assume we want to keep that as-is in
> > order to not reduce the propagation power (Which I assume is similar
> > to ree which uses the same logic).
> OK.  I was primarily concerned about the folding and rewriting aspects.
> It probably can only show up on targets with LEA like instructions, and
> even on such targets it's probably rate.
>
OK.
>
>
>
> >>> +/* Test if INSN is a memory load / store that can have an offset folded 
> >>> to it.
> >>> +   Return true iff INSN is such an instruction and return through 
> >>> MEM_OUT,
> >>> +   REG_OUT and OFFSET_OUT the RTX that has a MEM code, the register that 
> >>> is
> >>> +   used as a base address and the offset accordingly.
> >>> +   All of the out pointers may be NULL in which case they will be 
> >>> ignored.  */
> >>> +bool
> >>> +get_fold_mem_root (rtx_insn* insn, rtx *mem_out, rtx *reg_out,
> >>> +                HOST_WIDE_INT *offset_out)
> >>> +{
> >>> +  rtx set = single_set (insn);
> >>> +  rtx mem = NULL_RTX;
> >>> +
> >>> +  if (set != NULL_RTX)
> >>> +    {
> >>> +      rtx src = SET_SRC (set);
> >>> +      rtx dest = SET_DEST (set);
> >>> +
> >>> +      /* Don't fold when we have unspec / volatile.  */
> >>> +      if (GET_CODE (src) == UNSPEC
> >>> +       || GET_CODE (src) == UNSPEC_VOLATILE
> >>> +       || GET_CODE (dest) == UNSPEC
> >>> +       || GET_CODE (dest) == UNSPEC_VOLATILE)
> >>> +     return false;
> >>> +
> >>> +      if (MEM_P (src))
> >>> +     mem = src;
> >>> +      else if (MEM_P (dest))
> >>> +     mem = dest;
> >>> +      else if ((GET_CODE (src) == SIGN_EXTEND
> >>> +             || GET_CODE (src) == ZERO_EXTEND)
> >>> +            && MEM_P (XEXP (src, 0)))
> >> Note some architectures allow both a source and destination memory.  It
> >> looks like your code will prefer the source operand in that case.
> >> That's fine, just pointing it out.
> >>
> > Thanks for pointing that possibility out. I thought for a moment that
> > this would be a bug with multiple mentions of the address register.
> > but it should be fine due to:
> > /* Special case: A foldable memory store is not foldable if it
> > mentions DEST outside of the address calculation. */
> ACK.
>
> The other thing I keep pondering is autoincrement style addressing.
> Though I think at some point I convinced myself they weren't a problem.
> I think your checks only allow specific kinds of expressions for the
> memory address and I don't think {PRE,POST}_{INC,DEC.MODIFY} were in the
> list of valid ops.
>
Yes, although I haven't considered pre/post-inc/dec, they're indeed
not allowed due to what is allowed to be a root memory operation
(get_fold_mem_root).
I believe these shouldn't be an issue as anything that transitively
affects even one of these will be rejected.


>
> >>> +
> >>> +  int max_iters = 5;
> >>> +  for (int i = 0; i < max_iters; i++)
> >>> +    {
> >>> +      bool made_changes = false;
> >>> +      for (fold_info_map::iterator iter = fold_info->begin ();
> >>> +        iter != fold_info->end (); ++iter)
> >>> +     {
> >>> +       fold_mem_info *info = (*iter).second;
> >>> +       if (bitmap_intersect_p (&cannot_fold_insns, info->fold_insns))
> >>> +         made_changes |= bitmap_ior_into (&cannot_fold_insns,
> >>> +                                          info->fold_insns);
> >>> +     }
> >>> +
> >>> +      if (!made_changes)
> >>> +     return true;
> >>> +    }
> >>> +
> >>> +  return false;
> >> So how was the magic value of "5" determined here?  In general we try
> >> not to have magic #s like that and instead find a better way to control
> >> iterations, falling back to a PARAM when all else fails.
> >>
> > It's indeed just a magic value :)
> > In general even two iterations of this should be quite rare. One
> > iteration will handle the majority of interesting cases. I chose 5 to
> > limit the worst case execution time for degenerate cases.
> >
> > I'll be happy to change that to something better but I couldn't
> > figure out how "falling back to a PARAM when all else fails"  should
> > work here. Is there a similar example somewhere else in the code that
> > I can look at?
> If we expect two iterations to be quite rare, then I doubt a param is
> really worth the effort.  I might suggest using a loop like
>
> for (pass = 0; pass < flag_expensive_optimizations + 1; pass++)
>
> No magic numbers :-)   We use similar loops elsewhere for this kind of
> scenario.
>
Oh I see, thanks for noting that. I initially added a param, but this
is much better as the param wasn't useful in a meaningful way.
Done.

>
> If you ever need to create a param, the procedure is to add the option
> to params.opt.  If you look in that file you'll see that you define a
> variable name in the specification that you can then reference.
>
> So for a simple example look for param_avg_loop_niter.
>
So, V6 with the formatting fixes and the iteration count change is
here: https://gcc.gnu.org/pipermail/gcc-patches/2023-October/631839.html

Thanks for all the feedback!
Manolis

> Jeff

Re: [PATCH v5] Implement new RTL optimizations pass: fold-mem-offsets.

Reply via email to