On 10/23/24 8:27 AM, Konstantinos Eleftheriou wrote:
From: kelefth <konstantinos.elefther...@vrull.eu>

This pass detects cases of expensive store forwarding and tries to avoid them
by reordering the stores and using suitable bit insertion sequences.
For example it can transform this:

      strb    w2, [x1, 1]
      ldr     x0, [x1]      # Expensive store forwarding to larger load.

To:

      ldr     x0, [x1]
      strb    w2, [x1]
      bfi     x0, x2, 0, 8

Assembly like this can appear with bitfields or type punning / unions.
On stress-ng when running the cpu-union microbenchmark the following speedups
have been observed.

   Neoverse-N1:      +29.4%
   Intel Coffeelake: +13.1%
   AMD 5950X:        +17.5%
[ ... ]
Seems to still have some correctness issues. H8 reports this when the pass is enabled by default:

Tests that now fail, but worked before (8 tests):

h8300-sim/-mh/-mint32: gcc: gcc.c-torture/execute/pr63843.c   -O2  execution 
test
[ ... ]

It looks like we miss setting the high half of the register. The good sequence looks like:

!       mov.b   @er2,r3l
!       mov.b   r3l,@(2,er7)
        mov.b   @(1,er2),r2l
        mov.b   r2l,@(3,er7)
!       mov.w   @(2,er7),r0

Note the word (16 bit) move at the end of the sequence that sets both halves of the r0 register.

The broken sequence looks like this:

!       mov.b   @er2,r0l
        mov.b   @(1,er2),r2l
+       mov.b   r0l,@(2,er7)
        mov.b   r2l,@(3,er7)
!       mov.b   r2l,r0l

Note how all the assignments are byte sized and that nothing sets r0h. We get whatever value happened to be lying around in the high half of the register.

You should be able to see this with an H8 cross compiler and shouldn't need a full toolchain to test. Compile with -O2 -mh -mint32. There is only one opportunity for SFB avoidance in pr63843.

There's also a failure for bfin-elf, but I suspect it's ultimately the same underlying issue. Obviously I'll retest once there's a fix.

jeff


Reply via email to