Yes, as mentioned Jeff I have some work in that scope. The first is related to address computation when it has a large constant part. Suppose we have this code:
int consume (void *); int foo (void) { int x[1000]; return consume (x); } before IRA we have the following sequence 19: r140:DI=0xfffffffffffff000 20: r136:DI=r140:DI+0x60 REG_EQUAL 0xfffffffffffff060 8: a0:DI=frame:DI+r136:DI REG_DEAD r136:DI but during IRA (eliminate_regs_in_insn) insn 8 transforms to 8: a0:DI=r136:DI+0xfa0+frame:DI REG_DEAD r136:DI and in the end, we get the wrong sequence. 21: r136:DI=0xfffffffffffff060 REG_EQUIV 0xfffffffffffff060 25: r143:DI=0x1000 26: r142:DI=r143:DI-0x60 REG_DEAD r143:DI REG_EQUAL 0xfa0 27: r142:DI=r142:DI+r136:DI REG_DEAD r136:DI 8: a0:DI=r142:DI+frame:DI REG_DEAD r142:DI My changes prevent that transformation. I have tested on spec and did not get regressions. Besides. executed 40B fewer instructions. The second work related to hoisting out loop invariant code. I have a test case where SP + const can be hoisted out. ...... .L3: call foo addi a5,sp,16 sh3add a0,a0,a5 ....... Before IRA that code is already out of the loop, but IRA moves back. My approach was done in update_equiv_regs(). It prevents any move if its uses and defs are held in a single place, and used in the loop. Currently, that improvement is under evaluation. On Sat, Aug 12, 2023 at 4:05 AM Jeff Law via Gcc-patches < gcc-patches@gcc.gnu.org> wrote: > > > On 8/11/23 17:32, Vineet Gupta wrote: > > > > On 8/1/23 12:17, Vineet Gupta wrote: > >> Hi Jeff, > >> > >> As discussed this morning, I'm sending over dumps for the optim of DF > >> const -0.0 (PR/110748) [1] > >> For rv64gc_zbs build, IRA is undoing the split which eventually leads > >> to ICE in final pass. > >> > >> [1] https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110748#c15 > >> > >> void znd(double *d) { *d = -0.0; } > >> > >> > >> *split1* > >> > >> (insn 10 3 11 2 (set (reg:DI 136) > >> (const_int [0x8000000000000000])) "neg.c":4:5 -1 > >> > >> (insn 11 10 0 2 (set (mem:DF (reg:DI 135) [1 *d_2(D)+0 S8 A64]) > >> (subreg:DF (reg:DI 136) 0)) "neg.c":4:5 -1 > >> > >> *ira* > >> > >> (insn 11 9 12 2 (set (mem:DF (reg:DI 135) [1 *d_2(D)+0 S8 A64]) > >> (const_double:DF -0.0 [-0x0.0p+0])) "neg.c":4:5 190 > >> {*movdf_hardfloat_rv64} > >> (expr_list:REG_DEAD (reg:DI 135) > >> > >> > >> For the working case, the large const is not involved and not subject > >> to IRA playing foul. > > > > I investigated this some more. So IRA update_equiv_regs () has code > > identifying potential replacements: if a reg is referenced exactly > > twice: set once and used once. > > > > if (REG_N_REFS (regno) == 2 > > && (rtx_equal_p (replacement, src) > > || ! equiv_init_varies_p (src)) > > && NONJUMP_INSN_P (insn) > > && equiv_init_movable_p (PATTERN (insn), regno)) > > reg_equiv[regno].replace = 1; > > } > > > > And combine_and_move_insns () does the replacement, undoing the split1 > > above. > Right. This is as expected. There was actually similar code that goes > back even before the introduction of IRA -- like to the 80s and 90s. > > Conceptually the idea is a value with an equivalence that has a single > set and single use isn't a good use of a hard register. Better to > narrow the live range to a single pair of instructions. > > It's not always a good tradeoff. Consider if the equivalence was also a > loop invariant and hoisted out of the loop and register pressure is low. > > > > > > In fact this is the reason for many more split1 being undone. See the > > suboptimal codegen for large const for Andrew Pinski's test case [1] > No doubt. I think it's also a problem with some of Jivan's work. > > > > > > I'm wondering (naively) if there is some way to tune this - for a given > > backend. In general it would make sense to do the replacement, but not > > if the cost changes (e.g. consts could be embedded in x86 insn freely, > > but not for RISC-V where this is costly and if something is split, it > > might been intentional. > I'm not immediately aware of a way to tune. > > When it comes to tuning, the toplevel questions are do we have any of > the info we need to tune at the point where the transformation occurs. > The two most obvious pieces here would be loop info an register pressure. > > ie, do we have enough loop structure to know if the def is at a > shallower loop nest than the use. There's a reasonable chance we have > this information as my recollection is this analysis is done fairly > early in IRA. > > But that means we likely don't have any sense of register pressure at > the points between the def and use. So the most useful metric for > tuning isn't really available. > > The one thing that stands out is we don't do this transformation at all > when register pressure sensitive scheduling is enabled. And we really > should be turning that on by default. Our data shows register pressure > sensitive scheduling is about a 6-7% cycle improvement on x264 as it > avoids spilling in those key satd loops. > > > /* Don't move insns if live range shrinkage or register > > pressure-sensitive scheduling were done because it will not > > improve allocation but likely worsen insn scheduling. */ > > if (optimize > > && !flag_live_range_shrinkage > > && !(flag_sched_pressure && flag_schedule_insns)) > > combine_and_move_insns (); > > > So you might want to look at register pressure sensitive scheduling > first. If you go into x264_r from specint and look at > x264_pixel_satd_8x4. First verify the loops are fully unrolled. If > they are, then look for 32bit loads/stores into the stack. If you have > them, then you're spilling and getting crappy performance. Using > register pressure sensitive scheduling should help significantly. > > We've certainly seen that internally. The plan was to submit a patch to > make register pressure sensitive scheduling the default when the > scheduler is enabled. We just haven't pushed on it. If you can verify > that you're seeing spilling as well, then it'd certainly bolster the > argument that register-pressure-sensitive-scheduling is desirable. > > Jeff > > > > > > > > -- With the best regards Jivan Hakobyan