Ilya, thanks for posting this!  This patch is useful also on powerpc64.
Applying it solved a performance degradation with bwaves due to loss of
reassociation somewhere between 4.5 and 4.6 (still tracking it down).
When we apply -ftree-reassoc-width=2 to bwaves, the more optimal code
generation returns.

Bill

On Tue, 2011-07-12 at 16:30 +0400, Илья Энкович wrote:
> Hello,
> 
> Here is a patch related to missed optimization opportunity in tree
> reassoc phase.
> 
> Currently tree reassoc phase always generates a linear form which
> requires the minimum registers but has the highest tree height and
> does not allow computation to be performed in parallel. It may be
> critical for performance if required operations have high latency but
> can be pipelined (i.e. few execution units or low throughput). This
> problem becomes important on current Atom processors which are
> in-order and have many such instructions: IMUL and scalar SSE FP
> instructions.
> 
> This patch introduces a new feature to tree reassoc phase to generate
> computation tree with reduced height allowing to perform few
> long-latency instructions in parallel. It changes only one part of
> reassociation - rewrite_expr_tree. A level of parallelism is
> controlled via target hook and/or command line option.
> 
> New feature is enabled for Atom only by default. Patch boosts mostly
> CFP2000 geomean on Atom: +3.04% for 32 bit and +0.32% for 64 bit.
> 
> Bootstrapped and checked on x86_64-linux.
> 
> Thanks,
> Ilya
> --
> gcc/
> 
> 2011-07-12  Enkovich Ilya  <ilya.enkov...@intel.com>
> 
>       * target.def (reassociation_width): New hook.
> 
>       * doc/tm.texi.in (reassociation_width): New hook documentation.
> 
>       * doc/tm.texi (reassociation_width): Likewise.
> 
>       * hooks.h (hook_int_const_gimple_1): New default hook.
> 
>       * hooks.c (hook_int_const_gimple_1): Likewise.
> 
>       * config/i386/i386.h (ix86_tune_indices): Add
>       X86_TUNE_REASSOC_INT_TO_PARALLEL and
>       X86_TUNE_REASSOC_FP_TO_PARALLEL.
> 
>       (TARGET_REASSOC_INT_TO_PARALLEL): New.
>       (TARGET_REASSOC_FP_TO_PARALLEL): Likewise.
> 
>       * config/i386/i386.c (initial_ix86_tune_features): Add
>       X86_TUNE_REASSOC_INT_TO_PARALLEL and
>       X86_TUNE_REASSOC_FP_TO_PARALLEL.
> 
>       (ix86_reassociation_width) implementation of
>       new hook for i386 target.
> 
>       * common.opt (ftree-reassoc-width): New option added.
> 
>       * tree-ssa-reassoc.c (get_required_cycles): New function.
>       (get_reassociation_width): Likewise.
>       (rewrite_expr_tree_parallel): Likewise.
> 
>       (reassociate_bb): Now checks reassociation width to be used
>       and call rewrite_expr_tree_parallel instead of rewrite_expr_tree
>       if needed.
> 
>       (pass_reassoc): TODO_remove_unused_locals flag added.
> 
> gcc/testsuite/
> 
> 2011-07-12  Enkovich Ilya  <ilya.enkov...@intel.com>
> 
>       * gcc.dg/tree-ssa/pr38533.c (dg-options): Added option
>       -ftree-reassoc-width=1.
> 
>       * gcc.dg/tree-ssa/reassoc-24.c: New test.
>       * gcc.dg/tree-ssa/reassoc-25.c: Likewise.


Reply via email to