Hello, Here is a patch related to missed optimization opportunity in tree reassoc phase.
Currently tree reassoc phase always generates a linear form which requires the minimum registers but has the highest tree height and does not allow computation to be performed in parallel. It may be critical for performance if required operations have high latency but can be pipelined (i.e. few execution units or low throughput). This problem becomes important on current Atom processors which are in-order and have many such instructions: IMUL and scalar SSE FP instructions. This patch introduces a new feature to tree reassoc phase to generate computation tree with reduced height allowing to perform few long-latency instructions in parallel. It changes only one part of reassociation - rewrite_expr_tree. A level of parallelism is controlled via target hook and/or command line option. New feature is enabled for Atom only by default. Patch boosts mostly CFP2000 geomean on Atom: +3.04% for 32 bit and +0.32% for 64 bit. Bootstrapped and checked on x86_64-linux. Thanks, Ilya -- gcc/ 2011-07-12 Enkovich Ilya <ilya.enkov...@intel.com> * target.def (reassociation_width): New hook. * doc/tm.texi.in (reassociation_width): New hook documentation. * doc/tm.texi (reassociation_width): Likewise. * hooks.h (hook_int_const_gimple_1): New default hook. * hooks.c (hook_int_const_gimple_1): Likewise. * config/i386/i386.h (ix86_tune_indices): Add X86_TUNE_REASSOC_INT_TO_PARALLEL and X86_TUNE_REASSOC_FP_TO_PARALLEL. (TARGET_REASSOC_INT_TO_PARALLEL): New. (TARGET_REASSOC_FP_TO_PARALLEL): Likewise. * config/i386/i386.c (initial_ix86_tune_features): Add X86_TUNE_REASSOC_INT_TO_PARALLEL and X86_TUNE_REASSOC_FP_TO_PARALLEL. (ix86_reassociation_width) implementation of new hook for i386 target. * common.opt (ftree-reassoc-width): New option added. * tree-ssa-reassoc.c (get_required_cycles): New function. (get_reassociation_width): Likewise. (rewrite_expr_tree_parallel): Likewise. (reassociate_bb): Now checks reassociation width to be used and call rewrite_expr_tree_parallel instead of rewrite_expr_tree if needed. (pass_reassoc): TODO_remove_unused_locals flag added. gcc/testsuite/ 2011-07-12 Enkovich Ilya <ilya.enkov...@intel.com> * gcc.dg/tree-ssa/pr38533.c (dg-options): Added option -ftree-reassoc-width=1. * gcc.dg/tree-ssa/reassoc-24.c: New test. * gcc.dg/tree-ssa/reassoc-25.c: Likewise.
PR44382.diff
Description: Binary data