On 10/23/24 8:27 AM, Konstantinos Eleftheriou wrote:
From: kelefth <konstantinos.elefther...@vrull.eu>
This pass detects cases of expensive store forwarding and tries to avoid them
by reordering the stores and using suitable bit insertion sequences.
For example it can transform this:
strb w2, [x1, 1]
ldr x0, [x1] # Expensive store forwarding to larger load.
To:
ldr x0, [x1]
strb w2, [x1]
bfi x0, x2, 0, 8
Assembly like this can appear with bitfields or type punning / unions.
On stress-ng when running the cpu-union microbenchmark the following speedups
have been observed.
Neoverse-N1: +29.4%
Intel Coffeelake: +13.1%
AMD 5950X: +17.5%
[ ... ]
Seems to still have some correctness issues. H8 reports this when the
pass is enabled by default:
Tests that now fail, but worked before (8 tests):
h8300-sim/-mh/-mint32: gcc: gcc.c-torture/execute/pr63843.c -O2 execution
test
[ ... ]
It looks like we miss setting the high half of the register. The good
sequence looks like:
! mov.b @er2,r3l
! mov.b r3l,@(2,er7)
mov.b @(1,er2),r2l
mov.b r2l,@(3,er7)
! mov.w @(2,er7),r0
Note the word (16 bit) move at the end of the sequence that sets both
halves of the r0 register.
The broken sequence looks like this:
! mov.b @er2,r0l
mov.b @(1,er2),r2l
+ mov.b r0l,@(2,er7)
mov.b r2l,@(3,er7)
! mov.b r2l,r0l
Note how all the assignments are byte sized and that nothing sets r0h.
We get whatever value happened to be lying around in the high half of
the register.
You should be able to see this with an H8 cross compiler and shouldn't
need a full toolchain to test. Compile with -O2 -mh -mint32. There is
only one opportunity for SFB avoidance in pr63843.
There's also a failure for bfin-elf, but I suspect it's ultimately the
same underlying issue. Obviously I'll retest once there's a fix.
jeff