On Thu, Jun 13, 2024 at 7:18 PM Andi Kleen <a...@linux.intel.com> wrote: > > Manolis Tsamis <manolis.tsa...@vrull.eu> writes: > > > > Assembly like this can appear with bitfields or type punning / unions. > > On stress-ng when running the cpu-union microbenchmark the following > > speedups > > have been observed. > > > > Neoverse-N1: +29.4% > > Intel Coffeelake: +13.1% > > AMD 5950X: +17.5% > > It seems this should have some kind of target hook so that the target > can configure what forwards should be avoided. At least in x86 land > there is a trend to the hardware handling more and more cases with each > generation. > Hi Andi,
I have added a target hook for this in v4 of this patch. The hook receives all the information about the stores, the load, the estimated sequence cost and whether we expect to eliminate the load. With this information the target should be able to make an informed decision. What you mention is also true for AArch64: some microbenchmarking I did shows that some cores efficiently handle 32bit->64bit store forwarding while others not, so creating a target hook is necessary for such cases. > Also is there any data what this does to code size? Perhaps it should be > only done on hot blocks? > I haven't seen any large code size increases in general. In large benchmark it's usually some tens or few hundreds of instructions total. But in any case, for v4 I disable the pass based on optimize_insn_for_speed_p since we do expect a small size increase. > And did you see speedups on real applications? This is still hard to tell. In some cases I have observed either improvement or regressions in benchmarks, which are highly susceptible to costing and the specific store-forwarding penalties of the CPU. I have seen cases where the store-forwarding instance is profitable to avoid but we get bad code generation due to other reasons (usually store_bit_field lowering not being good enough) and hence a regression. So I believe more time and testing is needed to really evaluate the speedups that can be achieved. Thanks, Manolis > > -Andi