On Fri, 7 Jul 2023 at 10:28, Di Zhao OS via Gcc-patches <gcc-patches@gcc.gnu.org> wrote: > > Update the patch so it can apply. > > Tested on spec2017 fprate cases again. With option "-funroll-loops -Ofast > -flto", > the improvements of 1-copy run are: > > Ampere1: > 508.namd_r 4.26% > 510.parest_r 2.55% > Overall 0.54% > Intel Xeon: > 503.bwaves_r 1.3% > 508.namd_r 1.58% > overall 0.42%
This looks like a worthwhile improvement. >From reviewing the patch, a few nit-picks: - given that 'has_fma' can now take three values { -1, 0, 1 }, an enum with more descriptive names for these 3 states should be used; - using "has_fma >= 0" and "fma > 0" tests are hard to read; after changing this to an enum, you can use macros or helper functions to test the predicates (i.e., *_P macros or *_p helpers) for readability - the meaning of the return values of rank_ops_for_fma should be documented in the comment describing the function - changing convert_mult_to_fma_1 to return a tree* (i.e., return_lhs or NULL_TREE) removes the need for an in/out parameter Thanks, Philipp. > > > Thanks, > Di Zhao > > > > -----Original Message----- > > From: Di Zhao OS > > Sent: Friday, June 16, 2023 4:51 PM > > To: gcc-patches@gcc.gnu.org > > Subject: [PATCH] tree-optimization/110279- Check for nested FMA chains in > > reassoc > > > > This patch is to fix the regressions found in SPEC2017 fprate cases > > on aarch64. > > > > 1. Reused code in pass widening_mul to check for nested FMA chains > > (those connected by MULT_EXPRs), since re-writing to parallel > > generates worse codes. > > > > 2. Avoid re-arrange to produce less FMA chains that can be slow. > > > > Tested on ampere1 and neoverse-n1, this fixed the regressions in > > 508.namd_r and 510.parest_r 1 copy run. While I'm still collecting data > > on x86 machines we have, I'd like to know what do you think of this. > > > > (Previously I tried to improve things with FMA by adding a widening_mul > > pass before reassoc2 for it's easier to recognize different patterns > > of FMA chains and decide whether to split them. But I suppose handling > > them all in reassoc pass is more efficient.) > > > > Thanks, > > Di Zhao > > > > --- > > gcc/ChangeLog: > > > > * tree-ssa-math-opts.cc (convert_mult_to_fma_1): Add new parameter. > > Support new mode that merely do the checking. > > (struct fma_transformation_info): Moved to header. > > (class fma_deferring_state): Moved to header. > > (convert_mult_to_fma): Add new parameter. > > * tree-ssa-math-opts.h (struct fma_transformation_info): > > (class fma_deferring_state): Moved from .cc. > > (convert_mult_to_fma): Add function decl. > > * tree-ssa-reassoc.cc (rewrite_expr_tree_parallel): > > (rank_ops_for_fma): Return -1 if nested FMAs are found. > > (reassociate_bb): Avoid rewriting to parallel if nested FMAs are > > found. >