On Thu, Nov 28, 2024 at 8:37 AM Richard Biener <richard.guent...@gmail.com> wrote:
> On Mon, Nov 25, 2024 at 3:28 AM Philipp Tomsich > <philipp.toms...@vrull.eu> wrote: > > > > Pushed to master with the following fixups: > > - new timevar added > > - nits addressed > > - whitespace fixes > > The pass seems to be disabled by default everywhere - I thought we > decided to avoid adding > passes like this because they tend to bit-rot quickly and become a > maintenance burden. > > What was the plan here? > A patch for enabling ASF for O2 or higher is on the list: https://gcc.gnu.org/pipermail/gcc-patches/2025-January/674692.html BR Christoph > > Richard. > > > Philipp. > > > > > > On Mon, 25 Nov 2024 at 03:30, Jeff Law <jeffreya...@gmail.com> wrote: > > > > > > > > > > > > On 11/9/24 2:48 AM, Konstantinos Eleftheriou wrote: > > > > From: kelefth <konstantinos.elefther...@vrull.eu> > > > > > > > > This pass detects cases of expensive store forwarding and tries to > avoid them > > > > by reordering the stores and using suitable bit insertion sequences. > > > > For example it can transform this: > > > > > > > > strb w2, [x1, 1] > > > > ldr x0, [x1] # Expensive store forwarding to larger > load. > > > > > > > > To: > > > > > > > > ldr x0, [x1] > > > > strb w2, [x1] > > > > bfi x0, x2, 0, 8 > > > > > > > > Assembly like this can appear with bitfields or type punning / > unions. > > > > On stress-ng when running the cpu-union microbenchmark the following > speedups > > > > have been observed. > > > > > > > > Neoverse-N1: +29.4% > > > > Intel Coffeelake: +13.1% > > > > AMD 5950X: +17.5% > > > > > > > > The transformation is rejected on cases that would cause > store_bit_field > > > > to generate subreg expressions on different register classes. > > > > Files avoid-store-forwarding-4.c and avoid-store-forwarding-5.c > contain > > > > such cases and have been marked as XFAIL. > > > > > > > > There is a special handling for machines with BITS_BIG_ENDIAN != > > > > BYTES_BIG_ENDIAN. The need for this came up from an issue in H8 > > > > architecture, which uses big-endian ordering, but BITS_BIG_ENDIAN > > > > is false. In that case, the START parameter of store_bit_field > > > > needs to be calculated from the end of the destination register. > > > > > > > > gcc/ChangeLog: > > > > > > > > * Makefile.in (OBJS): Add avoid-store-forwarding.o. > > > > * common.opt (favoid-store-forwarding): New option. > > > > * common.opt.urls: Regenerate. > > > > * doc/invoke.texi: New param store-forwarding-max-distance. > > > > * doc/passes.texi: Document new pass. > > > > * doc/tm.texi: Regenerate. > > > > * doc/tm.texi.in: Document new pass. > > > > * params.opt (store-forwarding-max-distance): New param. > > > > * passes.def: Add pass_rtl_avoid_store_forwarding before > > > > pass_early_remat. > > > > * target.def (avoid_store_forwarding_p): New DEFHOOK. > > > > * target.h (struct store_fwd_info): Declare. > > > > * targhooks.cc (default_avoid_store_forwarding_p): New > function. > > > > * targhooks.h (default_avoid_store_forwarding_p): Declare. > > > > * tree-pass.h (make_pass_rtl_avoid_store_forwarding): Declare. > > > > * avoid-store-forwarding.cc: New file. > > > > * avoid-store-forwarding.h: New file. > > > > > > > > gcc/testsuite/ChangeLog: > > > > > > > > * gcc.target/aarch64/avoid-store-forwarding-1.c: New test. > > > > * gcc.target/aarch64/avoid-store-forwarding-2.c: New test. > > > > * gcc.target/aarch64/avoid-store-forwarding-3.c: New test. > > > > * gcc.target/aarch64/avoid-store-forwarding-4.c: New test. > > > > * gcc.target/aarch64/avoid-store-forwarding-5.c: New test. > > > > * gcc.target/x86_64/abi/callabi/avoid-store-forwarding-1.c: > New test. > > > > * gcc.target/x86_64/abi/callabi/avoid-store-forwarding-2.c: > New test. > > > > > > > > Signed-off-by: Philipp Tomsich <philipp.toms...@vrull.eu> > > > > Signed-off-by: Konstantinos Eleftheriou < > konstantinos.elefther...@vrull.eu> > > > > > > > > Series-version: 8 > > > > > > > > Series-changes: 8 > > > > - Fix store_bit_field call for big-endian targets, where > > > > BITS_BIG_ENDIAN is false. > > > > - Handle store_forwarding_max_distance = 0 as a special case > that > > > > disables cost checks for avoid-store-forwarding. > > > > - Update testcases for AArch64 and add testcases for x86-64. > > > > > > > > Series-changes: 7 > > > > - Fix bug when copying back the load register, in the case > that the > > > > load is eliminated. > > > > > > > > Series-changes: 6 > > > > - Reject the transformation on cases that would cause > store_bit_field > > > > to generate subreg expressions on different register classes. > > > > Files avoid-store-forwarding-4.c and > avoid-store-forwarding-5.c > > > > contain such cases and have been marked as XFAIL. > > > > - Use optimize_bb_for_speed_p instead of > optimize_insn_for_speed_p. > > > > - Inline and remove get_load_mem. > > > > - New implementation for is_store_forwarding. > > > > - Refactor the main loop in avoid_store_forwarding. > > > > - Avoid using the word 'forwardings'. > > > > - Use lowpart_subreg instead of validate_subreg + > gen_rtx_subreg. > > > > - Don't use df_insn_rescan where not needed. > > > > - Change order of emitting stores and bit insert instructions. > > > > - Check and reject loads for which the dest register overlaps > with src. > > > > - Remove unused variable. > > > > - Change some gen_mov_insn function calls to gen_rtx_SET. > > > > - Subtract the cost of eliminated load, instead of 1, for the > total cost. > > > > - Use delete_insn instead of set_insn_deleted. > > > > - Regenerate common.opt.urls. > > > > - Add some more comments. > > > > > > > > Series-changes: 5 > > > > - Fix bug with BIG_ENDIAN targets. > > > > - Fix bug with unrecognized instructions. > > > > - Fix / simplify pass init/fini. > > > > > > > > Series-changes: 4 > > > > - Change pass scheduling to run after sched1. > > > > - Add target hook to decide whether a store forwarding instance > > > > should be avoided or not. > > > > - Fix bugs. > > > > > > > > Series-changes: 3 > > > > - Only emit SUBREG after calling validate_subreg. > > > > - Fix memory corruption due to vec self-reference. > > > > - Fix bitmap_bit_in_range_p ICE due to BLKMode. > > > > - Reject MEM to MEM sets. > > > > - Add get_load_mem comment. > > > > - Add new testcase. > > > > > > > > Series-changes: 2 > > > > - Allow modes that are not scalar_int_mode. > > > > - Introduce simple costing to avoid unprofitable > transformations. > > > > - Reject bit insert sequences that spill to memory. > > > > - Document new pass. > > > > - Fix and add testcases. > > > > --- > > > > > > > +namespace { > > > > + > > > > +const pass_data pass_data_avoid_store_forwarding = > > > > +{ > > > > + RTL_PASS, /* type. */ > > > > + "avoid_store_forwarding", /* name. */ > > > > + OPTGROUP_NONE, /* optinfo_flags. */ > > > > + TV_NONE, /* tv_id. */ > > > > + 0, /* properties_required. */ > > > > + 0, /* properties_provided. */ > > > > + 0, /* properties_destroyed. */ > > > > + 0, /* todo_flags_start. */ > > > > + TODO_df_finish /* todo_flags_finish. */ > > > > +}; > > > Probably want a TV entry for store forwarding. While it shouldn't be a > > > big time sink, we've seen other passes that should be efficient go nuts > > > in some cases (extension elimination being the most recent example). > > > > > > > > > > > > > > > > + > > > > +/* Try to modify BB so that expensive store forwarding cases are > avoided. */ > > > > + > > > > +void store_forwarding_analyzer::avoid_store_forwarding (basic_block > bb) > > > > > > Formatting nit. Put the return type on its own line so that the > > > function name always starts on column 0 of its own line. > > > > > > > > > > > > > + > > > > +/* Update pass statistics. */ > > > > + > > > > +void store_forwarding_analyzer::update_stats (function *fn) > > > Similarly. > > > > > > > > > OK with the new timevar and two formatting nits. Thanks for your > > > patience on this. > > > > > > Jeff > > > > > > >