On 10/19/22 01:46, Richard Biener wrote:
On Wed, Oct 19, 2022 at 5:44 AM Jeff Law via Gcc <gcc@gcc.gnu.org> wrote:
On 10/18/22 20:09, Vineet Gupta wrote:
On 10/18/22 16:36, Jeff Law wrote:
There isn't a great place in GCC to handle this right now. If the
constraints were relaxed in PRE, then we'd have a chance, but
getting the cost model right is going to be tough.
It would have been better (for this specific case) if loop unrolling
was not being done so early. The tree pass cunroll is flattening it
out and leaving for rest of the all tree/rtl passes to pick up the
pieces and remove any redundancies, if at all. It obviously needs to
be early if we are injecting 7x more instructions, but seems like a
lot to unravel.
Yup. If that loop gets unrolled, it's going to be a mess. It will
almost certainly make this problem worse as each iteration is going
to have a pair of constants loaded and no good way to remove them.
Thats the original problem that I started this thread with. I'd
snipped the disassembly as it would have been too much text but
basically on RV, Coremark crc8 loop of const 8 iterations gets
unrolled including extraneous 8 insns pairs to load the same constant
- which is preposterous. Other arches side-step by using if-conversion
/ cond moves, latter currently WIP in RV International. x86 w/o
if-convert seems OK since the const can be encoded in the xor insn.
OTOH given that gimple/tree-pass cunroll is doing the culprit loop
unrolling and introducing redundant const 8 times, can it ne addressed
there somehow.
tree_estimate_loop_size() seems to identify constant expression, not
just an operand. Can it be taught to identify a "non-trivial const"
and hoist/code-move the expression. Sorry just rambling here, most
likely non-sense.
On GIMPLE all constants are "simple".
Oh, cunroll. There might be a distinct flag for complete unrolling.
At -O3 we peel completely, there's no flag to disable that.
I really expect something like Click's work is the way forward.
Essentially when you VN the function you'll identify those constants and
collapse them all down to a single instance. Then the GCM phase will
kick in and find a place to put the evaluation so that you have one and
only one.
I'd say postreload gcse would be a place to do that. At least when
there's no available hardreg CSEing likely isn't going to be a win.
That's an interesting idea. Do it aggressively post-reload when we know
there's a register available. Vineet, that seems like it's worth
investigation.
jeff