[Bug rtl-optimization/116002] GCC Compiler time-hog with large basic block in Function

rguenth at gcc dot gnu.org via Gcc-bugs Fri, 19 Jul 2024 07:19:54 -0700

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116002


Richard Biener <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
      Known to fail|                            |14.1.1
          Component|c                           |rtl-optimization
     Ever confirmed|0                           |1
            Summary|GCC Compiler Hang with      |GCC Compiler time-hog with
                   |Recursive Macros in         |large basic block in
                   |Function                    |Function
   Last reconfirmed|                            |2024-07-19
           Keywords|                            |compile-time-hog
             Status|UNCONFIRMED                 |NEW

--- Comment #1 from Richard Biener <rguenth at gcc dot gnu.org> ---
With only 'E E' it's

 CPROP                              :  18.28 ( 62%)

with 'E E E'

 dead store elim1                   :  17.04 ( 22%)
 CPROP                              :  50.19 ( 65%)

with 'E E E E'

 dead store elim1                   :  36.41 ( 23%)
 CPROP                              : 106.68 ( 67%)

this is the usual gcse memory/compile-time hog with its (and DF) bitmap
operations.
We already disable it via gcse_or_cprop_is_too_expensive but the heuristic
doesn't trigger here - it's few basic blocks only.  We end up

  <bb 2> [local count: 1073741824]:
  if (c_11666(D) <= 12344)
    goto <bb 3>; [50.00%]
  else
    goto <bb 4>; [50.00%]

  <bb 3> [local count: 536870912]:
  d_11667 = c_11666(D) % 12345;
  _1 = d_11667 * 12345;
  e_11668 = _1 ^ c_11666(D);
  _2 = c_11666(D) * d_11667;
  _3 = _2 % e_11668; 
  f_11669 = _3 % 12345;
  _4 = f_11669 * 12345;
  e_11670 = _4 ^ f_11669;
  _5 = f_11669 * f_11669;
  _6 = _5 % e_11670;
  f_11671 = _6 % 12345;
  _7 = f_11671 * 12345;
  e_11672 = _7 ^ f_11671;
  _8 = f_11671 * f_11671;
  _9 = _8 % e_11672;
...
  _11664 = _11663 % e_19442;
  f_19443 = _11664 % 12345;

  <bb 4> [local count: 1073741824]:
  # c_11665 = PHI <c_11666(D)(2), f_19443(3)>
  return c_11665;

thus very many pseudos and insns but few basic-blocks.  I believe there's
a duplicate.

Samples: 114K of event 'cycles:P', Event count (approx.): 121898207258          
Overhead       Samples  Command  Shared Object         Symbol                   
  61.00%         70222  cc1      cc1                   [.]
_Z22rtx_equal_for_cselib_1P7rtx_defS0_12machine_modei
  10.17%         11704  cc1      cc1                   [.]
_Z13cselib_lookupP7rtx_def12machine_modeiS1_
   6.42%          7400  cc1      cc1                   [.]
_ZN10hash_tableI13cselib_hasherLb0E11xcallocatorE19find_slot_with
   4.75%          5251  cc1      cc1                   [.]
_ZL10topo_visitP16constraint_graphR3vecIj7va_heap6vl_ptrEP17simpl
   2.91%          3219  cc1      cc1                   [.]
_ZL11solve_graphP16constraint_graph

ah, I've seen this as well - the CSELIB hashing is a joke;  I've run into this
earlier.  It certainly will produce collisions a lot
with a lot of expressions with the same RTX code like this.  Ah, it was
cse.cc hashing that was even worse.  But CSELIB hashing also uses + for
mixing.

[Bug rtl-optimization/116002] GCC Compiler time-hog with large basic block in Function

Reply via email to