Hi Jeff,
Thx for the detailed explanation and insight.
On 6/7/23 16:44, Jeff Law wrote:
With 2e886eef7f2b, define_insn_and_split "*mvconst_internal" recog()
kicks in during cse1, eliding insns for a const_int.
(insn 7 6 8 2 (set (reg:DI 137)
(const_int [0x1010101])) {*mvconst_internal}
(expr_list:REG_EQUAL (const_int [0x1010101])))
[...]
(insn 11 10 12 2 (set (reg:DI 140)
(const_int [0x1010101_00000000])) {*mvconst_internal}
(expr_list:REG_EQUAL (const_int [0x1010101_00000000]) ))
Understood. Not ideal, but we generally don't have good ways to limit
patterns to being available at different times during the optimization
phase. One thing you might want to try (which I thought we used at
one point) was make the pattern conditional on cse_not_expected. The
goal would be to avoid exposing the pattern until a later point in the
optimizer pipeline. It may have been the case that we dropped that
over time during development. It's all getting fuzzy at this point.
Gave this a try and it seems to fix Andrew's test, but then regresses
the actual large const case: 0x1010101_01010101 : the mem to const_int
transformation was being done in cse1 which no longer happens and the
const pool from initial expand remains all the way into asm generated. I
don't think we want to go back to that state
Eventually split1 breaks it up using same mvconst_internal splitter,
but the cse opportunity has been lost.
Right. I'd have to look at the pass definitions, but I suspect the
splitting pass where this happens is after the last standard CSE pass.
So we don't get a chance to CSE the constant synthesis.
Yep split1 and friends happen after cse1 and cse2. At -O2 gcse doesn't
kick in and if forced to, it is currently limited in what it can do more
so given this is post reload.
*This is a now a baseline for large consts handling for RV backend
which we all need to be aware of*.
Understood. Though it's not as bad as you might think :-) You can
spend an inordinate amount of time improving constant synthesis,
generate code that looks really good, but in the end it may not make a
bit of different in real performance. Been there, done that. I'm not
saying we give up, but we need to keep in mind that we're often better
off trading a bit on the constant synthesis if doing so helps code
where those constants get used.
Understood :-) I was coming to same realization and this seems like a
good segway into switching topic and investigating post reload gcse for
Const Rematerialization, another pesky issue with RV and likely to have
bigger impact across a whole bunch of workloads.
FWIW, IRA for latter case only, emits additional REG_EQUIV notes
which could also be playing a role.
REG_EQUAL notes get promoted to REG_EQUIV notes in some cases. And
when other equivalences are discovered it may create a REG_EQUIV note
out of thin air.
The REG_EQUIV note essentially means that everywhere the register
occurs you can validly (from a program semantics standpoint) replace
the register with the value. It might require reloading, but it's a
valid semantic transformation which may reduce register pressure --
especially for constants that were subject to LICM.
Contrast to REG_EQUAL which creates an equivalence at a particular
point in the IL, but the equivalence may not hold elsewhere in the IL.
Ok. From reading gccint it seems REG_EQUIV is a stronger form of
equivalence and seems to be prefered by post reload passes, while
REG_EQUAL is more of use in pre-reload.
I would also look at reload_cse_regs which should give us some
chance at seeing the value reuse if/when IRA/LRA muck things up.
I'll be out of office for the rest of week, will look into this once I'm
back.
Thx,
-Vineet