On 10/18/22 16:36, Jeff Law wrote:
There isn't a great place in GCC to handle this right now. If the
constraints were relaxed in PRE, then we'd have a chance, but
getting the cost model right is going to be tough.
It would have been better (for this specific case) if loop unrolling
was not being done so early. The tree pass cunroll is flattening it
out and leaving for rest of the all tree/rtl passes to pick up the
pieces and remove any redundancies, if at all. It obviously needs to
be early if we are injecting 7x more instructions, but seems like a
lot to unravel.
Yup. If that loop gets unrolled, it's going to be a mess. It will
almost certainly make this problem worse as each iteration is going to
have a pair of constants loaded and no good way to remove them.
Thats the original problem that I started this thread with. I'd snipped
the disassembly as it would have been too much text but basically on RV,
Coremark crc8 loop of const 8 iterations gets unrolled including
extraneous 8 insns pairs to load the same constant - which is
preposterous. Other arches side-step by using if-conversion / cond
moves, latter currently WIP in RV International. x86 w/o if-convert
seems OK since the const can be encoded in the xor insn.
OTOH given that gimple/tree-pass cunroll is doing the culprit loop
unrolling and introducing redundant const 8 times, can it ne addressed
there somehow.
tree_estimate_loop_size() seems to identify constant expression, not
just an operand. Can it be taught to identify a "non-trivial const" and
hoist/code-move the expression. Sorry just rambling here, most likely
non-sense.
FWIW -fno-unroll-loops only seems to work at -O2. At -O3 it always
unrolls. Is that expected ?
The only case I'm immediately aware of where this wouldn't work would
be if -O3 came after -fno-unroll-oops.
Weird that gcc-12, gcc-11, gcc-10 all seem to be silently ignoring
-funroll-loops despite following -O3. Perhaps a different toggle is
needed to supress the issue.
Thx,
-Vineet