> > > I am generally trying to get rid of remaing uses of REG_FREQ since the > > > 10000 based fixed point arithmetics iot always working that well. > > > > > > You can do the sums in profile_count type (doing something reasonable > > > when count is uninitialized) and then convert it to sreal for the final > > > heuristics. > > > > Changed, when count_max is initialized and function is not optimized to > > size, > > use > > bb_count.initialized_p > > ? bb_count.probability_in (count_max).to_sreal () > > : 0.1; > > > > Otherwise, just use 1. > > > > > > n some benchmark, I notice stv failed due to cost unprofitable, but the > > igain > > is inside the loop, but sse<->integer conversion is outside the loop, > > current cost > > model doesn't consider the frequency of those gain/cost. > > The patch weights those cost with frequency. > > > > The patch regressed gcc.target/i386/minmax-6.c under -m32 Since the > > place of integer<->sse is before the branch, and the conversion to > > min/max is in the branch, with static profile, the cost model is not > > profitable anymore which is exactly the patch try to do. > > Considering the original testcase is to guard RA issue, so restrict > > the testcase under ! ia32 should still be ok. > > > > Bootstrapped and regtested on x86_64-pc-linux-gnu{-m32,}. > > Ok for trunk? > > > > gcc/ChangeLog: > > > > * config/i386/i386-features.cc (get_bb_probability): New function. > > (scalar_chain::mark_dual_mode_def): Weight > > n_integer_to_sse/n_sse_to_integer with bb frequency. > > (general_scalar_chain::compute_convert_gain): Ditto, and > > adjust function prototype to return true/false when cost model > > is profitable or not. > > (timode_scalar_chain::compute_convert_gain): Ditto. > > (convert_scalars_to_vector): Adjust after the upper two > > function prototype are changed. > > * config/i386/i386-features.h (class scalar_chain): Change > > n_integer_to_sse/n_sse_to_integer from int to sreal. > > (class general_scalar_chain): Adjust prototype to return bool > > intead of int. > > (class timode_scalar_chain): Ditto. > > > > gcc/testsuite/ChangeLog: > > * gcc.target/i386/minmax-6.c: Adjust testcase. > > --- > > gcc/config/i386/i386-features.cc | 83 +++++++++++++++++------- > > gcc/config/i386/i386-features.h | 10 +-- > > gcc/testsuite/gcc.target/i386/minmax-6.c | 2 +- > > 3 files changed, 65 insertions(+), 30 deletions(-) > > > > diff --git a/gcc/config/i386/i386-features.cc > > b/gcc/config/i386/i386-features.cc > > index c35ac24fd8a..57dcd94c211 100644 > > --- a/gcc/config/i386/i386-features.cc > > +++ b/gcc/config/i386/i386-features.cc > > @@ -326,6 +326,30 @@ scalar_chain::add_to_queue (unsigned insn_uid) > > insn_uid, chain_id); > > } > > > > +/* Return BB probablity in cfun->cfg->count_max, > > + if count_max is uninitialized or optimized_for_size, return 1. > > + else if bb count is uninitialized, return 0.1 as a rough estimation. > > + else return bb_count.probablity_in (count_max). */ > > +static sreal > > +get_bb_probability (basic_block bb) > > I think you want relative bb frequency that you can get by: > > sreal_freq = bb->count.to_sreal_scale (ENTRY_BLOCK_PTR_FOR_FN (fun)->count) > > The value returned will be 1 if BB is executed once per invocation of > functions and >1 if it is in loop etc. > Also if BB count is uninitialized it will return 1 that is probably > reasonable deafult. > > +{ > > + profile_count bb_count = bb->count; > > + profile_count count_max = cfun->cfg->count_max; > > + sreal sreal_count = 1; > > + if (count_max.initialized_p () > > + && !optimize_function_for_size_p (cfun)) > > + { > > + if (bb_count.initialized_p ()) > > + sreal_count = bb_count.probability_in (count_max).to_sreal (); > > + else > > + /* For unintialized BB, weight it with 0.1, > > + ??? just like REG_FREQ_MAX / BB_FREQ_MAX. */ > > + sreal_count = (sreal) 1 / (sreal) 10; > > + } > > + > > + return sreal_count; > > +} > > + > > /* For DImode conversion, mark register defined by DEF as requiring > > conversion. */ > > > > @@ -337,18 +361,20 @@ scalar_chain::mark_dual_mode_def (df_ref def) > > /* Record the def/insn pair so we can later efficiently iterate over > > the defs to convert on insns not in the chain. */ > > bool reg_new = bitmap_set_bit (defs_conv, DF_REF_REGNO (def)); > > + sreal sreal_count = get_bb_probability (BLOCK_FOR_INSN (DF_REF_INSN > > (def))); > > + > > if (!bitmap_bit_p (insns, DF_REF_INSN_UID (def))) > > { > > if (!bitmap_set_bit (insns_conv, DF_REF_INSN_UID (def)) > > && !reg_new) > > return; > > - n_integer_to_sse++; > > + n_integer_to_sse += sreal_count; > > } > > else > > { > > if (!reg_new) > > return; > > - n_sse_to_integer++; > > + n_sse_to_integer += sreal_count; > > Generaly I am trying to sum counts in profile_count data type and only > later convert them to relative frequencies. This will work too, but + > of sreal is bit more expensive. > > } > > > > if (dump_file) > > @@ -529,15 +555,15 @@ general_scalar_chain::vector_const_cost (rtx exp) > > return ix86_cost->sse_load[smode == DImode ? 1 : 0]; > > } > > > > -/* Compute a gain for chain conversion. */ > > +/* Return true if it's cost profitable for chain conversion. */ > > > > -int > > +bool > > general_scalar_chain::compute_convert_gain () > > { > > bitmap_iterator bi; > > unsigned insn_uid; > > - int gain = 0; > > - int cost = 0; > > + sreal gain = 0; > > + sreal cost = 0; > > so gain is the difference of runtime of integer variant compared to > vector vairant and cost are the extra int->see and sse->int conversions > needed? > > If you scale everything by a BB frequency, you will get a weird > behaviour if chain happens to consist only of instructions in BBs with 0 > frequency (that commonly happens in FDO). > > I think you want to look at the chain first and see if it contains a hot > instruction. If it does, use performance metrics (ix86_cost and scaling > by BB frequencies) if it does not use size metrics (ix86_size_cost and > all scales 1)
There is rtl_ssa::changes_are_worthwhile which solves similar problem. Logic there seems OK to me. It computes unscaled cost in integers and weighted cost that is updated only for hot BBs After both set of costs are computed, if the weighted cost indicates that replacement is worthwhile, it will be done. This works well for chains that contains some portions in cold BBs with frequency of 0 as well as for those that have everything in cold BBs. Honza