Hi Jeff,

Thx for the detailed explanation and insight.

On 6/7/23 16:44, Jeff Law wrote:
With 2e886eef7f2b, define_insn_and_split "*mvconst_internal" recog() kicks in during cse1, eliding insns for a const_int.

    (insn 7 6 8 2 (set (reg:DI 137)
         (const_int [0x1010101])) {*mvconst_internal}
         (expr_list:REG_EQUAL (const_int [0x1010101])))
    [...]

    (insn 11 10 12 2 (set (reg:DI 140)
         (const_int [0x1010101_00000000])) {*mvconst_internal}
         (expr_list:REG_EQUAL (const_int  [0x1010101_00000000]) ))
Understood.  Not ideal, but we generally don't have good ways to limit patterns to being available at different times during the optimization phase.  One thing you might want to try (which I thought we used at one point) was make the pattern conditional on cse_not_expected.  The goal would be to avoid exposing the pattern until a later point in the optimizer pipeline.  It may have been the case that we dropped that over time during development.  It's all getting fuzzy at this point.

Gave this a try and it seems to fix Andrew's test, but then regresses the actual large const case: 0x1010101_01010101 : the mem to const_int transformation was being done in cse1 which no longer happens and the const pool from initial expand remains all the way into asm generated. I don't think we want to go back to that state



Eventually split1 breaks it up using same mvconst_internal splitter, but the cse opportunity has been lost.
Right.  I'd have to look at the pass definitions, but I suspect the splitting pass where this happens is after the last standard CSE pass. So we don't get a chance to CSE the constant synthesis.

Yep split1 and friends happen after cse1 and cse2. At -O2 gcse doesn't kick in and if forced to, it is currently limited in what it can do more so given this is post reload.


*This is a now a baseline for large consts handling for RV backend which we all need to be aware of*.
Understood.  Though it's not as bad as you might think :-)  You can spend an inordinate amount of time improving constant synthesis, generate code that looks really good, but in the end it may not make a bit of different in real performance.  Been there, done that.  I'm not saying we give up, but we need to keep in mind that we're often better off trading a bit on the constant synthesis if doing so helps code where those constants get used.

Understood :-) I was coming to same realization and this seems like a good segway into switching topic and investigating post reload gcse for Const Rematerialization, another pesky issue with RV and likely to have bigger impact across a whole bunch of workloads.

FWIW, IRA for latter case only, emits additional REG_EQUIV notes which could also be playing a role.
REG_EQUAL notes get promoted to REG_EQUIV notes in some cases. And when other equivalences are discovered it may create a REG_EQUIV note out of thin air.

The REG_EQUIV note essentially means that everywhere the register occurs you can validly (from a program semantics standpoint) replace the register with the value.  It might require reloading, but it's a valid semantic transformation which may reduce register pressure -- especially for constants that were subject to LICM.

Contrast to REG_EQUAL which creates an equivalence at a particular point in the IL, but the equivalence may not hold elsewhere in the IL.

Ok. From reading gccint it seems REG_EQUIV is a stronger form of equivalence and seems to be prefered by post reload passes, while REG_EQUAL is more of use in pre-reload.


  I would also look at reload_cse_regs which should give us some chance at seeing the value reuse if/when IRA/LRA muck things up.

I'll be out of office for the rest of week, will look into this once I'm back.

Thx,
-Vineet

Reply via email to