Hello Richard, Thanks for the review!
> The fact that you have to adjust gcc.dg/tree-ssa/pr38533.c looks problematic > to me. Can you investigate the problem report, look at the geneated > code with the atom default of the param and see whether it's still > reasonably "fixed" (maybe you'd already done this)? This test was created as a regression test to the problem in linearize_expr_tree which moves all statements down to the first modified one during reassociation increasing registers pressure. Test has a lot of definitions which are all ORed like this: def r1 def r2 s = r1 or r2 def r3 s = s or r3 def r4 s = s or r4 ... and it checks that register pressure is not increased after reassociation by simple scan of two sequential defs. If we use reassociation width higher than 1 then test will fails because we need to increase register pressure to get parallelism and finally get code like this: def r1 def r2 def r3 t1 = r1 or r2 s = s or r3 def r4 def r5 s = s or t1 t1 = r4 or r5 ... So, I had to fix a test. > > + /* Check if we may use less width and still compute sequence for > + the same time. It will allow us to reduce registers usage. */ > + while (width > 1 > + && get_required_cycles (ops_num, width - 1) == cycles_best) > + width--; > > I suppose get_required_cycles () is monotonic in width? Would it > make sense to binary search the best width then (I realize the > default is 1 and the only target differing has 2, but ...)? Maybe at > least add a comment to this effect? Or not decrement by one > but instead divide by two on each iteration (which is the same for 1 and 2 > ...)? > It's also all a mapping that is constant - we should be able to > pre-compute it for all values, eventually "compressing" it to a > much simpler formula width = f (cpu_width, ops_num)? I replaced sequential search with a binary search. I did not pay much attention to this function because do not think it is really time consuming compared to other parts of reassociation phase. Am I too optimistic? If you think it might significantly affect compile time I can introduce a table of pre-computed values (or make it later as a separate fix). I made all other fixes as you suggested. Bootstrapped and checked on x86_64-linux. Thanks, Ilya --- gcc/ 2011-08-31 Enkovich Ilya <ilya.enkov...@intel.com> PR middle-end/44382 * target.def (reassociation_width): New hook. * doc/tm.texi.in (reassociation_width): Likewise. * doc/tm.texi (reassociation_width): Likewise. * doc/invoke.texi (tree-reassoc-width): New param documented. * hooks.h (hook_int_uint_mode_1): New default hook. * hooks.c (hook_int_uint_mode_1): Likewise. * config/i386/i386.h (ix86_tune_indices): Add X86_TUNE_REASSOC_INT_TO_PARALLEL and X86_TUNE_REASSOC_FP_TO_PARALLEL. (TARGET_REASSOC_INT_TO_PARALLEL): New. (TARGET_REASSOC_FP_TO_PARALLEL): Likewise. * config/i386/i386.c (initial_ix86_tune_features): Add X86_TUNE_REASSOC_INT_TO_PARALLEL and X86_TUNE_REASSOC_FP_TO_PARALLEL. (ix86_reassociation_width) implementation of new hook for i386 target. * params.def (PARAM_TREE_REASSOC_WIDTH): New param added. * tree-ssa-reassoc.c (get_required_cycles): New function. (get_reassociation_width): Likewise. (swap_ops_for_binary_stmt): Likewise. (rewrite_expr_tree_parallel): Likewise. (rewrite_expr_tree): Refactored. Part of code moved into swap_ops_for_binary_stmt. (reassociate_bb): Now checks reassociation width to be used and call rewrite_expr_tree_parallel instead of rewrite_expr_tree if needed. gcc/testsuite/ 2011-08-31 Enkovich Ilya <ilya.enkov...@intel.com> * gcc.dg/tree-ssa/pr38533.c (dg-options): Added option --param tree-reassoc-width=1. * gcc.dg/tree-ssa/reassoc-24.c: New test. * gcc.dg/tree-ssa/reassoc-25.c: Likewise.
PR44382.diff
Description: Binary data