Ilya, thanks for posting this! This patch is useful also on powerpc64. Applying it solved a performance degradation with bwaves due to loss of reassociation somewhere between 4.5 and 4.6 (still tracking it down). When we apply -ftree-reassoc-width=2 to bwaves, the more optimal code generation returns.
Bill On Tue, 2011-07-12 at 16:30 +0400, Илья Энкович wrote: > Hello, > > Here is a patch related to missed optimization opportunity in tree > reassoc phase. > > Currently tree reassoc phase always generates a linear form which > requires the minimum registers but has the highest tree height and > does not allow computation to be performed in parallel. It may be > critical for performance if required operations have high latency but > can be pipelined (i.e. few execution units or low throughput). This > problem becomes important on current Atom processors which are > in-order and have many such instructions: IMUL and scalar SSE FP > instructions. > > This patch introduces a new feature to tree reassoc phase to generate > computation tree with reduced height allowing to perform few > long-latency instructions in parallel. It changes only one part of > reassociation - rewrite_expr_tree. A level of parallelism is > controlled via target hook and/or command line option. > > New feature is enabled for Atom only by default. Patch boosts mostly > CFP2000 geomean on Atom: +3.04% for 32 bit and +0.32% for 64 bit. > > Bootstrapped and checked on x86_64-linux. > > Thanks, > Ilya > -- > gcc/ > > 2011-07-12 Enkovich Ilya <ilya.enkov...@intel.com> > > * target.def (reassociation_width): New hook. > > * doc/tm.texi.in (reassociation_width): New hook documentation. > > * doc/tm.texi (reassociation_width): Likewise. > > * hooks.h (hook_int_const_gimple_1): New default hook. > > * hooks.c (hook_int_const_gimple_1): Likewise. > > * config/i386/i386.h (ix86_tune_indices): Add > X86_TUNE_REASSOC_INT_TO_PARALLEL and > X86_TUNE_REASSOC_FP_TO_PARALLEL. > > (TARGET_REASSOC_INT_TO_PARALLEL): New. > (TARGET_REASSOC_FP_TO_PARALLEL): Likewise. > > * config/i386/i386.c (initial_ix86_tune_features): Add > X86_TUNE_REASSOC_INT_TO_PARALLEL and > X86_TUNE_REASSOC_FP_TO_PARALLEL. > > (ix86_reassociation_width) implementation of > new hook for i386 target. > > * common.opt (ftree-reassoc-width): New option added. > > * tree-ssa-reassoc.c (get_required_cycles): New function. > (get_reassociation_width): Likewise. > (rewrite_expr_tree_parallel): Likewise. > > (reassociate_bb): Now checks reassociation width to be used > and call rewrite_expr_tree_parallel instead of rewrite_expr_tree > if needed. > > (pass_reassoc): TODO_remove_unused_locals flag added. > > gcc/testsuite/ > > 2011-07-12 Enkovich Ilya <ilya.enkov...@intel.com> > > * gcc.dg/tree-ssa/pr38533.c (dg-options): Added option > -ftree-reassoc-width=1. > > * gcc.dg/tree-ssa/reassoc-24.c: New test. > * gcc.dg/tree-ssa/reassoc-25.c: Likewise.