Re: [PATCH Atom][PR middle-end/44382] Tree reassociation improvement

Ilya Enkovich Wed, 31 Aug 2011 05:17:49 -0700

Hello Richard,

Thanks for the review!


> The fact that you have to adjust gcc.dg/tree-ssa/pr38533.c looks problematic
> to me.  Can you investigate the problem report, look at the geneated
> code with the atom default of the param and see whether it's still
> reasonably "fixed" (maybe you'd already done this)?
This test was created as a regression test to the problem in
linearize_expr_tree which moves all statements down to the first
modified one during reassociation increasing registers pressure. Test
has a lot of definitions which are all ORed like this:
  def r1
  def r2
  s = r1 or r2
  def r3
  s = s or r3
  def r4
  s = s or r4
  ...
and it checks that register pressure is not increased after
reassociation by simple scan of two sequential defs. If we use
reassociation width higher than 1 then test will fails because we need
to increase register pressure to get parallelism and finally get code
like this:
  def r1
  def r2
  def r3
  t1 = r1 or r2
  s = s or r3
  def r4
  def r5
  s = s or t1
  t1 = r4 or r5
  ...
So, I had to fix a test.
>
> +  /* Check if we may use less width and still compute sequence for
> +     the same time.  It will allow us to reduce registers usage.  */
> +  while (width > 1
> +        && get_required_cycles (ops_num, width - 1) == cycles_best)
> +    width--;
>
> I suppose get_required_cycles () is monotonic in width?  Would it
> make sense to binary search the best width then (I realize the
> default is 1 and the only target differing has 2, but ...)?  Maybe at
> least add a comment to this effect?  Or not decrement by one
> but instead divide by two on each iteration (which is the same for 1 and 2 
> ...)?
> It's also all a mapping that is constant - we should be able to
> pre-compute it for all values, eventually "compressing" it to a
> much simpler formula width = f (cpu_width, ops_num)?
I replaced sequential search with a binary search. I did not pay much
attention to this function because do not think it is really time
consuming compared to other parts of reassociation phase. Am I too
optimistic? If you think it might significantly affect compile time I
can introduce a table of pre-computed values (or make it later as a
separate fix).

I made all other fixes as you suggested.

Bootstrapped and checked on x86_64-linux.

Thanks,
Ilya
---
gcc/

2011-08-31  Enkovich Ilya  <ilya.enkov...@intel.com>

        PR middle-end/44382
        * target.def (reassociation_width): New hook.

        * doc/tm.texi.in (reassociation_width): Likewise.

        * doc/tm.texi (reassociation_width): Likewise.

        * doc/invoke.texi (tree-reassoc-width): New param documented.

        * hooks.h (hook_int_uint_mode_1): New default hook.

        * hooks.c (hook_int_uint_mode_1): Likewise.

        * config/i386/i386.h (ix86_tune_indices): Add
        X86_TUNE_REASSOC_INT_TO_PARALLEL and
        X86_TUNE_REASSOC_FP_TO_PARALLEL.

        (TARGET_REASSOC_INT_TO_PARALLEL): New.
        (TARGET_REASSOC_FP_TO_PARALLEL): Likewise.

        * config/i386/i386.c (initial_ix86_tune_features): Add
        X86_TUNE_REASSOC_INT_TO_PARALLEL and
        X86_TUNE_REASSOC_FP_TO_PARALLEL.

        (ix86_reassociation_width) implementation of
        new hook for i386 target.

        * params.def (PARAM_TREE_REASSOC_WIDTH): New param added.

        * tree-ssa-reassoc.c (get_required_cycles): New function.
        (get_reassociation_width): Likewise.
        (swap_ops_for_binary_stmt): Likewise.
        (rewrite_expr_tree_parallel): Likewise.

        (rewrite_expr_tree): Refactored. Part of code moved into
        swap_ops_for_binary_stmt.

        (reassociate_bb): Now checks reassociation width to be used
        and call rewrite_expr_tree_parallel instead of rewrite_expr_tree
        if needed.

gcc/testsuite/

2011-08-31  Enkovich Ilya  <ilya.enkov...@intel.com>

        * gcc.dg/tree-ssa/pr38533.c (dg-options): Added option
        --param tree-reassoc-width=1.

        * gcc.dg/tree-ssa/reassoc-24.c: New test.
        * gcc.dg/tree-ssa/reassoc-25.c: Likewise.

PR44382.diff
Description: Binary data

Re: [PATCH Atom][PR middle-end/44382] Tree reassociation improvement

Reply via email to