On 10/18/22 16:36, Jeff Law wrote:
There isn't a great place in GCC to handle this right now.  If the constraints were relaxed in PRE, then we'd have a chance, but getting the cost model right is going to be tough.

It would have been better (for this specific case) if loop unrolling was not being done so early. The tree pass cunroll is flattening it out and leaving for rest of the all tree/rtl passes to pick up the pieces and remove any redundancies, if at all. It obviously needs to be early if we are injecting 7x more instructions, but seems like a lot to unravel.

Yup.  If that loop gets unrolled, it's going to be a mess.  It will almost certainly make this problem worse as each iteration is going to have a pair of constants loaded and no good way to remove them.

Thats the original problem that I started this thread with. I'd snipped the disassembly as it would have been too much text but basically on RV, Coremark crc8 loop of const 8 iterations gets unrolled including extraneous 8 insns pairs to load the same constant - which is preposterous. Other arches side-step by using if-conversion / cond moves, latter currently WIP in RV International. x86 w/o if-convert seems OK since the const can be encoded in the xor insn.

OTOH given that gimple/tree-pass cunroll is doing the culprit loop unrolling and introducing redundant const 8 times, can it ne addressed there somehow. tree_estimate_loop_size() seems to identify constant expression, not just an operand. Can it be taught to identify a "non-trivial const" and hoist/code-move the expression. Sorry just rambling here, most likely non-sense.



FWIW -fno-unroll-loops only seems to work at -O2. At -O3 it always unrolls. Is that expected ?

The only case I'm immediately aware of where this wouldn't work would be if -O3 came after -fno-unroll-oops.

Weird that gcc-12, gcc-11, gcc-10 all seem to be silently ignoring -funroll-loops despite following -O3. Perhaps a different toggle is needed to supress the issue.

Thx,
-Vineet

Reply via email to