On Tue, 30 Nov 2021, Jakub Jelinek wrote:

> Hi!
> 
> Seems simplify_associate_operation is quadratic, which isn't a big deal
> for use during combine and other similar RTL passes, because those never
> try to combine expressions from more than a few instructions and because
> those instructions need to be recognized the machine description also bounds
> how many expressions can appear in there.

Well ...

> var-tracking has depth limits only for some cases and unlimited depth
> for the vt_expand_loc though:
> /* This is the value used during expansion of locations.  We want it
>    to be unbounded, so that variables expanded deep in a recursion
>    nest are fully evaluated, so that their values are cached
>    correctly.  We avoid recursion cycles through other means, and we
>    don't unshare RTL, so excess complexity is not a problem.  */
> #define EXPR_DEPTH (INT_MAX)
> /* We use this to keep too-complex expressions from being emitted as
>    location notes, and then to debug information.  Users can trade
>    compile time for ridiculously complex expressions, although they're
>    seldom useful, and they may often have to be discarded as not
>    representable anyway.  */
> #define EXPR_USE_DEPTH (param_max_vartrack_expr_depth)
> 
> IMO for very large expressions it isn't worth trying to reassociate though,
> in fact e.g. for the new testcase below keeping it as is has bigger chance
> of generating smaller debug info which the dwarf2out.c part of the change
> tries to achieve - if a binary operation has the same operands, we can
> use DW_OP_dup and not bother computing the possibly large operand again.
> 
> This patch punts if the associate operands contain together more than
> 64 same operations, which can happen only during var-tracking.
> During bootstrap/regtest on x86_64-linux and i686-linux, this triggers
> only on the new testcase and on gcc.dg/torture/pr88597.c.
> I think given the 16 element static buffer in subrtx_iterator::array_type
> it shouldn't slow down the common case of small expressions, but have
> been wondering whether we shouldn't have some in_vartrack global flag
> or guard it with
> (current_pass && strcmp (current_pass->name, "vartrack") == 0).
> 
> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?
> 
> Another possibility to deal with the power expressions in debug info
> would be to introduce some new RTL operation for the pow{,i} (x, n)
> case, allow that solely in debug insns and expand those into DWARF
> using a loop.  But that seems like quite a lot of work for something rarely
> used (especially when powi for larger n is only useful for 0 and 1 inputs
> because anything else overflows).

I wonder given we now have 'simplify_context' whether we can
track a re-association budget we can eat from.  At least your
code to determine whether the expression is too large is
quadratic as well (but bound to 64, so just a very large constant
overhead for an outermost expression of size 63).  We already
have a mem_depth there, so just have reassoc_times and punt
if that reaches --param max-simplify-reassoc-times, incrementing
it each time simplify_associative_operation is entered?

Thanks,
Richard.

> 2021-11-30  Jakub Jelinek  <ja...@redhat.com>
> 
>       PR rtl-optimization/102356
>       * simplify-rtx.c: Include rtl-iter.h.
>       (simplify_associative_operation): Don't reassociate very large
>       expressions with 64 or more CODE subrtxes in the operands.
>       * dwarf2out.c (mem_loc_descriptor): Optimize binary operation
>       with both operands the same using DW_OP_dup.
> 
>       * gcc.dg/pr102356.c: New test.
> 
> --- gcc/simplify-rtx.c.jj     2021-11-05 00:43:22.576624649 +0100
> +++ gcc/simplify-rtx.c        2021-11-29 19:46:29.674750656 +0100
> @@ -37,6 +37,7 @@ along with GCC; see the file COPYING3.
>  #include "selftest-rtl.h"
>  #include "rtx-vector-builder.h"
>  #include "rtlanal.h"
> +#include "rtl-iter.h"
>  
>  /* Simplification and canonicalization of RTL.  */
>  
> @@ -2263,9 +2264,40 @@ simplify_context::simplify_associative_o
>  {
>    rtx tem;
>  
> +  if (GET_CODE (op0) == code || GET_CODE (op1) == code)
> +    {
> +      /* During vartrack, the expressions can grow arbitrarily large.
> +      Reassociation isn't really useful for larger expressions
> +      and can be very compile time expensive.  */
> +      unsigned count = 0;
> +      subrtx_iterator::array_type array;
> +      FOR_EACH_SUBRTX (iter, array, op0, NONCONST)
> +     {
> +       const_rtx x = *iter;
> +       if (GET_CODE (x) == code)
> +         {
> +           if (count++ >= 64)
> +             return NULL_RTX;
> +         }
> +       else
> +         iter.skip_subrtxes ();
> +     }
> +      FOR_EACH_SUBRTX (iter, array, op1, NONCONST)
> +     {
> +       const_rtx x = *iter;
> +       if (GET_CODE (x) == code)
> +         {
> +           if (count++ >= 64)
> +             return NULL_RTX;
> +         }
> +       else
> +         iter.skip_subrtxes ();
> +     }
> +    }
> +
>    /* Linearize the operator to the left.  */
>    if (GET_CODE (op1) == code)
>      {
>        /* "(a op b) op (c op d)" becomes "((a op b) op c) op d)".  */
>        if (GET_CODE (op0) == code)
>       {
> --- gcc/dwarf2out.c.jj        2021-11-29 14:24:14.053634713 +0100
> +++ gcc/dwarf2out.c   2021-11-29 19:44:54.192113401 +0100
> @@ -16363,6 +16363,15 @@ mem_loc_descriptor (rtx rtl, machine_mod
>      do_binop:
>        op0 = mem_loc_descriptor (XEXP (rtl, 0), mode, mem_mode,
>                               VAR_INIT_STATUS_INITIALIZED);
> +      if (XEXP (rtl, 0) == XEXP (rtl, 1))
> +     {
> +       if (op0 == 0)
> +         break;
> +       mem_loc_result = op0;
> +       add_loc_descr (&mem_loc_result, new_loc_descr (DW_OP_dup, 0, 0));
> +       add_loc_descr (&mem_loc_result, new_loc_descr (op, 0, 0));
> +       break;
> +     }
>        op1 = mem_loc_descriptor (XEXP (rtl, 1), mode, mem_mode,
>                               VAR_INIT_STATUS_INITIALIZED);
>  
> --- gcc/testsuite/gcc.dg/pr102356.c.jj        2021-11-29 19:57:47.061075856 
> +0100
> +++ gcc/testsuite/gcc.dg/pr102356.c   2021-11-29 19:57:26.852364489 +0100
> @@ -0,0 +1,33 @@
> +/* PR rtl-optimization/102356 */
> +/* { dg-do compile { target int32plus } } */
> +/* { dg-options "-O3 -g" } */
> +
> +signed char a = 0;
> +unsigned char b = 9;
> +unsigned long long c = 0xF1FBFC17225F7A57ULL;
> +int d = 0x3A6667C6;
> +
> +unsigned char
> +foo (unsigned int x)
> +{
> +  unsigned int *e = &x;
> +  if ((c /= ((0 * (*e *= b)) <= 0)))
> +    ;
> +  for (d = 9; d > 2; d -= 2)
> +    {
> +      c = -2;
> +      do
> +     if ((*e *= *e))
> +       {
> +         a = 4;
> +         do
> +           {
> +             a -= 3;
> +             if ((*e *= *e))
> +               b = 9;
> +           }
> +         while (a > 2);
> +       }
> +      while (c++);
> +    }
> +}
> 
>       Jakub
> 
> 

-- 
Richard Biener <rguent...@suse.de>
SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409 Nuernberg,
Germany; GF: Ivo Totev; HRB 36809 (AG Nuernberg)

Reply via email to