On Mon, Mar 23, 2020 at 10:53 AM Yangfei (Felix) <felix.y...@huawei.com> wrote: > > Hi, > > I created a bug for this issue: > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94269 > Looks like widening_mul phase may move multiply instruction from outside > the loop to inside the loop, merging with one add instruction inside the loop. > This will increase the cost of the loop at least on aarch64 (4 cycles vs 1 > cycle). I think widening_mul should consider block frequency when doing such > a combination. > I mean something like:
As written in the PR I'd follow fma generation and restrict defs to the same BB. Richard. > diff --git a/gcc/tree-ssa-math-opts.c b/gcc/tree-ssa-math-opts.c > index 54ba035..4439452 100644 > --- a/gcc/tree-ssa-math-opts.c > +++ b/gcc/tree-ssa-math-opts.c > @@ -2721,7 +2721,10 @@ convert_plusminus_to_widen (gimple_stmt_iterator *gsi, > gimple *stmt, > { > if (!has_single_use (rhs1) > || !is_widening_mult_p (rhs1_stmt, &type1, &mult_rhs1, > - &type2, &mult_rhs2)) > + &type2, &mult_rhs2) > + || (gimple_bb (rhs1_stmt) != gimple_bb (stmt) > + && gimple_bb (rhs1_stmt)->count.to_frequency(cfun) > + < gimple_bb (stmt)->count.to_frequency(cfun))) > return false; > add_rhs = rhs2; > conv_stmt = conv1_stmt; > @@ -2730,7 +2733,10 @@ convert_plusminus_to_widen (gimple_stmt_iterator *gsi, > gimple *stmt, > { > if (!has_single_use (rhs2) > || !is_widening_mult_p (rhs2_stmt, &type1, &mult_rhs1, > - &type2, &mult_rhs2)) > + &type2, &mult_rhs2) > + || (gimple_bb (rhs2_stmt) != gimple_bb (stmt) > + && gimple_bb (rhs2_stmt)->count.to_frequency(cfun) > + < gimple_bb (stmt)->count.to_frequency(cfun))) > return false; > add_rhs = rhs1; > conv_stmt = conv2_stmt; > > Comments? > > Thanks, > Felix