On Fri, 7 Jun 2024, Jeff Law wrote:

> 
> 
> On 6/6/24 4:10 AM, Manolis Tsamis wrote:
> > This pass detects cases of expensive store forwarding and tries to avoid
> > them
> > by reordering the stores and using suitable bit insertion sequences.
> > For example it can transform this:
> > 
> >       strb    w2, [x1, 1]
> >       ldr     x0, [x1]      # Expensive store forwarding to larger load.
> > 
> > To:
> > 
> >       ldr     x0, [x1]
> >       strb    w2, [x1]
> >       bfi     x0, x2, 0, 8
> > 
> > Assembly like this can appear with bitfields or type punning / unions.
> > On stress-ng when running the cpu-union microbenchmark the following
> > speedups
> > have been observed.
> > 
> >    Neoverse-N1:      +29.4%
> >    Intel Coffeelake: +13.1%
> >    AMD 5950X:        +17.5%
> > 
> > gcc/ChangeLog:
> > 
> >  * Makefile.in: Add avoid-store-forwarding.o.
> >  * common.opt: New option -favoid-store-forwarding.
> >  * params.opt: New param store-forwarding-max-distance.
> >  * doc/invoke.texi: Document new pass.
> >  * doc/passes.texi: Document new pass.
> >  * passes.def: Schedule a new pass.
> >  * tree-pass.h (make_pass_rtl_avoid_store_forwarding): Declare.
> >  * avoid-store-forwarding.cc: New file.
> > 
> > gcc/testsuite/ChangeLog:
> > 
> >  * gcc.target/aarch64/avoid-store-forwarding-1.c: New test.
> >  * gcc.target/aarch64/avoid-store-forwarding-2.c: New test.
> >  * gcc.target/aarch64/avoid-store-forwarding-3.c: New test.
> >  * gcc.target/aarch64/avoid-store-forwarding-4.c: New test.
> So this is getting a lot more interesting.  I think the first time I looked at
> this it was more concerned with stores feeding something like a load-pair and
> avoiding the store forwarding penalty for that case.  Am I mis-remembering, or
> did it get significantly more general?
> 
> 
> 
> 
> 
> > +
> > +static unsigned int stats_sf_detected = 0;
> > +static unsigned int stats_sf_avoided = 0;
> > +
> > +static rtx
> > +get_load_mem (rtx expr)
> Needs a function comment.  You should probably mention that EXPR must be a
> single_set in that comment.
> 
> 
> 
>  +
> > +  rtx dest;
> > +  if (eliminate_load)
> > +    dest = gen_reg_rtx (load_inner_mode);
> > +  else
> > +    dest = SET_DEST (load);
> > +
> > +  int move_to_front = -1;
> > +  int total_cost = 0;
> > +
> > +  /* Check if we can emit bit insert instructions for all forwarded stores.
> > */
> > +  FOR_EACH_VEC_ELT (stores, i, it)
> > +    {
> > +      it->mov_reg = gen_reg_rtx (GET_MODE (it->store_mem));
> > +      rtx_insn *insns = NULL;
> > +
> > +      /* If we're eliminating the load then find the store with zero offset
> > +    and use it as the base register to avoid a bit insert.  */
> > +      if (eliminate_load && it->offset == 0)
> So often is this triggering?  We have various codes in the gimple optimizers
> to detect store followed by a load from the same address and do the
> forwarding.  If they're happening with any frequency that would be a good sign
> code in DOM and elsewhere isn't working well.
> 
> THe way these passes detect this case is to take store, flip the operands
> around (ie, it looks like a load) and enter that into the expression hash
> tables.  After that standard redundancy elimination approaches will work.
> 
> 
> > +   {
> > +     start_sequence ();
> > +
> > +     /* We can use a paradoxical subreg to force this to a wider mode, as
> > +        the only use will be inserting the bits (i.e., we don't care
> > about
> > +        the value of the higher bits).  */
> Which may be a good hint about the cases you're capturing -- if the
> modes/sizes differ that would make more sense since I don't think we're as
> likely to be capturing those cases.

Yeah, we handle stores from constants quite well and FRE can forward
stores from SSA names to smaller loads by inserting BIT_FIELD_REFs but 
it does have some additional restrictions.

> 
> > diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
> > index 4e8967fd8ab..c769744d178 100644
> > --- a/gcc/doc/invoke.texi
> > +++ b/gcc/doc/invoke.texi
> > @@ -12657,6 +12657,15 @@ loop unrolling.
> >   This option is enabled by default at optimization levels @option{-O1},
> >   @option{-O2}, @option{-O3}, @option{-Os}.
> >   
> > +@opindex favoid-store-forwarding
> > +@item -favoid-store-forwarding
> > +@itemx -fno-avoid-store-forwarding
> > +Many CPUs will stall for many cycles when a load partially depends on
> > previous
> > +smaller stores.  This pass tries to detect such cases and avoid the penalty
> > by
> > +changing the order of the load and store and then fixing up the loaded
> > value.
> > +
> > +Disabled by default.
> Is there any particular reason why this would be off by default at -O1 or
> higher?  It would seem to me that on modern cores that this transformation
> should easily be a win.  Even on an old in-order core, avoiding the load with
> the bit insert is likely profitable, just not as much so.

I would think it's the targets to decide for a default.

> > diff --git a/gcc/params.opt b/gcc/params.opt
> > index d34ef545bf0..b8115f5c27a 100644
> > --- a/gcc/params.opt
> > +++ b/gcc/params.opt
> > @@ -1032,6 +1032,10 @@ Allow the store merging pass to introduce unaligned
> > stores if it is legal to do
> >   Common Joined UInteger Var(param_store_merging_max_size) Init(65536)
> >   IntegerRange(1, 65536) Param Optimization
> >   Maximum size of a single store merging region in bytes.
> >   
> > +-param=store-forwarding-max-distance=
> > +Common Joined UInteger Var(param_store_forwarding_max_distance) Init(10)
> > IntegerRange(1, 1000) Param Optimization
> > +Maximum number of instruction distance that a small store forwarded to a
> > larger load may stall.
> I think you may need to run the update-urls script since you've added a new
> option.
> 
> 
> In general it seems pretty reasonable.
> 
> I've actually added it to my tester just to see if there's any fallout. It'll
> take a week to churn through the long running targets that bootstrap in QEMU,
> but the crosses should have data Monday.
> 
> jeff
> 
> 
> 

-- 
Richard Biener <rguent...@suse.de>
SUSE Software Solutions Germany GmbH,
Frankenstrasse 146, 90461 Nuernberg, Germany;
GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)

Reply via email to