Hi Jeff, thanks for the feedback. Indeed, there was an issue with copying back the load register when the load is eliminated. I just sent a new version (https://gcc.gnu.org/pipermail/gcc-patches/2024-October/666230.html).
On Fri, Oct 18, 2024 at 9:55 PM Jeff Law <jeffreya...@gmail.com> wrote: > > > > On 10/18/24 3:57 AM, Konstantinos Eleftheriou wrote: > > From: kelefth <konstantinos.elefther...@vrull.eu> > > > > This pass detects cases of expensive store forwarding and tries to avoid > > them > > by reordering the stores and using suitable bit insertion sequences. > > For example it can transform this: > > > > strb w2, [x1, 1] > > ldr x0, [x1] # Expensive store forwarding to larger load. > > > > To: > > > > ldr x0, [x1] > > strb w2, [x1] > > bfi x0, x2, 0, 8 > > > > Assembly like this can appear with bitfields or type punning / unions. > > On stress-ng when running the cpu-union microbenchmark the following > > speedups > > have been observed. > > > > Neoverse-N1: +29.4% > > Intel Coffeelake: +13.1% > > AMD 5950X: +17.5% > So just fired this up on the crosses after enabling it by default. It's > still got several hours to go, but there's a pretty clear goof in here > that's showing up on multiple targets. > > Just taking mcore-elf as an example, we're mis-compiling muldi3 from libgcc. > > We have this in the .asmcons dump: > > > (insn 37 36 40 4 (set (mem/j/c:SI (reg/f:SI 8 r8) [1 MEM[(union > > *)_61].s.low+0 S4 A64]) > > (reg:SI 77 [ _10 ])) "/home/jlaw/test/gcc/libgcc/libgcc2.c":532:649 > > discrim 3 65 {*mcore.md:1196} > > (expr_list:REG_DEAD (reg:SI 77 [ _10 ]) > > (nil))) > [ ... ] > > > (insn 44 43 45 4 (set (mem/j/c:SI (plus:SI (reg/f:SI 8 r8) > > (const_int 4 [0x4])) [1 MEM[(union *)_61].s.high+0 S4 A32]) > > (reg:SI 81 [ _18 ])) "/home/jlaw/test/gcc/libgcc/libgcc2.c":534:12 > > 65 {*mcore.md:1196} > > (expr_list:REG_DEAD (reg:SI 81 [ _18 ]) > > (nil))) > > (note 45 44 49 4 NOTE_INSN_DELETED) > > (insn 49 45 50 4 (set (reg/i:DI 2 r2) > > (mem/j/c:DI (reg/f:SI 8 r8) [1 MEM[(union *)_61].ll+0 S8 A64])) > > "/home/jlaw/test/gcc/libgcc/libgcc2.c":538:1 68 {movdi_i} > > (nil)) > > So we've got two SImode stores which are then loaded in DImode a bit > later to set the return value for the function. A very clear > opportunity to do store forwarding. > > > In the store-forwarding dump we have: > > (insn 70 36 40 4 (set (reg:SI 95) > > (reg:SI 77 [ _10 ])) "/home/jlaw/test/gcc/libgcc/libgcc2.c":532:649 > > discrim 3 65 {*mcore.md:1196} > > (nil)) > [ ... ] > > (insn 67 43 45 4 (set (reg:SI 94) > > (reg:SI 81 [ _18 ])) "/home/jlaw/test/gcc/libgcc/libgcc2.c":534:12 > > 65 {*mcore.md:1196} > > (nil)) > > (note 45 67 71 4 NOTE_INSN_DELETED) > > (insn 71 45 69 4 (set (mem/j/c:SI (reg/f:SI 8 r8) [1 MEM[(union > > *)_61].s.low+0 S4 A64]) > > (reg:SI 95)) "/home/jlaw/test/gcc/libgcc/libgcc2.c":538:1 65 > > {*mcore.md:1196} > > (nil)) > > (insn 69 71 68 4 (set (reg:DI 93) > > (subreg:DI (reg:SI 95) 0)) > > "/home/jlaw/test/gcc/libgcc/libgcc2.c":538:1 68 {movdi_i} > > (nil)) > > (insn 68 69 66 4 (set (mem/j/c:SI (plus:SI (reg/f:SI 8 r8) > > (const_int 4 [0x4])) [1 MEM[(union *)_61].s.high+0 S4 A32]) > > (reg:SI 94)) "/home/jlaw/test/gcc/libgcc/libgcc2.c":538:1 65 > > {*mcore.md:1196} > > (nil)) > > (insn 66 68 50 4 (set (subreg:SI (reg:DI 93) 4) > > (reg:SI 94)) "/home/jlaw/test/gcc/libgcc/libgcc2.c":538:1 65 > > {*mcore.md:1196} > > (nil)) > > Note that we never put a value into (reg:DI 2), so the return value from > this routine is garbage, naturally leading to testsuite failures. > > It looks like we're missing a copy from (reg:DI 93) to (reg:DI 2) to me. > > You should be able to see this with a cross compiler and don't need > binutils/gas, newlib, etc. > > Compile the attached testcase with an mcore-elf configured compiler with > -O3 -favoid-store-forwarding > > > Related, but obviously not a requirement to go forward. After the SFB > elimination, the two stores at insns 71, 68 are dead and could be > removed. In theory DSE should have eliminated them, but isn't for > reasons I haven't investigated. > > Jeff >