Yes, as mentioned Jeff I have some work in that scope.

The first is related to address computation when it has a large constant
part.
Suppose we have this code:

    int  consume (void *);
    int foo (void) {
       int x[1000];
       return consume (x);
    }

before IRA we have the following sequence
    19: r140:DI=0xfffffffffffff000
    20: r136:DI=r140:DI+0x60
      REG_EQUAL 0xfffffffffffff060
    8: a0:DI=frame:DI+r136:DI
      REG_DEAD r136:DI

but during IRA (eliminate_regs_in_insn) insn 8 transforms to
   8: a0:DI=r136:DI+0xfa0+frame:DI
        REG_DEAD r136:DI

and in the end, we get the wrong sequence.
   21: r136:DI=0xfffffffffffff060
      REG_EQUIV 0xfffffffffffff060
   25: r143:DI=0x1000
   26: r142:DI=r143:DI-0x60
      REG_DEAD r143:DI
      REG_EQUAL 0xfa0
   27: r142:DI=r142:DI+r136:DI
      REG_DEAD r136:DI
   8: a0:DI=r142:DI+frame:DI
      REG_DEAD r142:DI

My changes prevent that transformation.
I have tested on spec and did not get regressions.
Besides. executed 40B fewer instructions.

The second work related to hoisting out loop invariant code.
I have a test case where SP + const can be hoisted out.
......
.L3:
      call foo
      addi a5,sp,16
      sh3add a0,a0,a5
.......

Before IRA that code is already out of the loop, but IRA moves back.
My approach was done in update_equiv_regs().
It prevents any move if its uses and defs are held in a single place, and
used in the loop.
Currently, that improvement is under evaluation.


On Sat, Aug 12, 2023 at 4:05 AM Jeff Law via Gcc-patches <
gcc-patches@gcc.gnu.org> wrote:

>
>
> On 8/11/23 17:32, Vineet Gupta wrote:
> >
> > On 8/1/23 12:17, Vineet Gupta wrote:
> >> Hi Jeff,
> >>
> >> As discussed this morning, I'm sending over dumps for the optim of DF
> >> const -0.0 (PR/110748)  [1]
> >> For rv64gc_zbs build, IRA is undoing the split which eventually leads
> >> to ICE in final pass.
> >>
> >> [1] https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110748#c15
> >>
> >> void znd(double *d) {  *d = -0.0;   }
> >>
> >>
> >> *split1*
> >>
> >> (insn 10 3 11 2 (set (reg:DI 136)
> >>         (const_int [0x8000000000000000])) "neg.c":4:5 -1
> >>
> >> (insn 11 10 0 2 (set (mem:DF (reg:DI 135) [1 *d_2(D)+0 S8 A64])
> >>         (subreg:DF (reg:DI 136) 0)) "neg.c":4:5 -1
> >>
> >> *ira*
> >>
> >> (insn 11 9 12 2 (set (mem:DF (reg:DI 135) [1 *d_2(D)+0 S8 A64])
> >>         (const_double:DF -0.0 [-0x0.0p+0])) "neg.c":4:5 190
> >> {*movdf_hardfloat_rv64}
> >>      (expr_list:REG_DEAD (reg:DI 135)
> >>
> >>
> >> For the working case, the large const is not involved and not subject
> >> to IRA playing foul.
> >
> > I investigated this some more. So IRA update_equiv_regs () has code
> > identifying potential replacements: if a reg is referenced exactly
> > twice: set once and used once.
> >
> >            if (REG_N_REFS (regno) == 2
> >                && (rtx_equal_p (replacement, src)
> >                || ! equiv_init_varies_p (src))
> >                && NONJUMP_INSN_P (insn)
> >                && equiv_init_movable_p (PATTERN (insn), regno))
> >              reg_equiv[regno].replace = 1;
> >          }
> >
> > And combine_and_move_insns () does the replacement, undoing the split1
> > above.
> Right.  This is as expected.  There was actually similar code that goes
> back even before the introduction of IRA -- like to the 80s and 90s.
>
> Conceptually the idea is a value with an equivalence that has a single
> set and single use isn't a good use of a hard register.  Better to
> narrow the live range to a single pair of instructions.
>
> It's not always a good tradeoff.  Consider if the equivalence was also a
> loop invariant and hoisted out of the loop and register pressure is low.
>
>
> >
> > In fact this is the reason for many more split1 being undone. See the
> > suboptimal codegen for large const for Andrew Pinski's test case [1]
> No doubt.  I think it's also a problem with some of Jivan's work.
>
>
> >
> > I'm wondering (naively) if there is some way to tune this - for a given
> > backend. In general it would make sense to do the replacement, but not
> > if the cost changes (e.g. consts could be embedded in x86 insn freely,
> > but not for RISC-V where this is costly and if something is split, it
> > might been intentional.
> I'm not immediately aware of a way to tune.
>
> When it comes to tuning, the toplevel questions are do we have any of
> the info we need to tune at the point where the transformation occurs.
> The two most obvious pieces here would be loop info an register pressure.
>
> ie, do we have enough loop structure to know if the def is at a
> shallower loop nest than the use.  There's a reasonable chance we have
> this information as my recollection is this analysis is done fairly
> early in IRA.
>
> But that means we likely don't have any sense of register pressure at
> the points between the def and use.   So the most useful metric for
> tuning isn't really available.
>
> The one thing that stands out is we don't do this transformation at all
> when register pressure sensitive scheduling is enabled.  And we really
> should be turning that on by default.  Our data shows register pressure
> sensitive scheduling is about a 6-7% cycle improvement on x264 as it
> avoids spilling in those key satd loops.
>
> >  /* Don't move insns if live range shrinkage or register
> >      pressure-sensitive scheduling were done because it will not
> >      improve allocation but likely worsen insn scheduling.  */
> >   if (optimize
> >       && !flag_live_range_shrinkage
> >       && !(flag_sched_pressure && flag_schedule_insns))
> >     combine_and_move_insns ();
>
>
> So you might want to look at register pressure sensitive scheduling
> first.  If you go into x264_r from specint and look at
> x264_pixel_satd_8x4.  First verify the loops are fully unrolled.  If
> they are, then look for 32bit loads/stores into the stack.  If you have
> them, then you're spilling and getting crappy performance.  Using
> register pressure sensitive scheduling should help significantly.
>
> We've certainly seen that internally.  The plan was to submit a patch to
> make register pressure sensitive scheduling the default when the
> scheduler is enabled.  We just haven't pushed on it.  If you can verify
> that you're seeing spilling as well, then it'd certainly bolster the
> argument that register-pressure-sensitive-scheduling is desirable.
>
> Jeff
>
>
>
>
>
>
>
>

-- 
With the best regards
Jivan Hakobyan

Reply via email to