[Bug tree-optimization/100081] [11 Regression] Compile time hog in irange since r11-4135-ge864d395b4e862ce

amacleod at redhat dot com via Gcc-bugs Mon, 19 Apr 2021 07:09:26 -0700

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100081


--- Comment #12 from Andrew Macleod <amacleod at redhat dot com> ---
(In reply to Richard Biener from comment #10)
> (In reply to Andrew Macleod from comment #8)
> > OMG.  Don't bother reducing. I can see the problem.
> > 
> > EVRP is fine, but when wrestrict runs, its quite late, and the CFG has
> > 
> > <bb 560> [local count: 28382607]:
> >   <...>
> >   _571 = _61 >= _593;
> >   _3583 = &arr_724 + _3992;
> >   _2220 = _831 <= _3583;
> >   _47 = _571 | _2220;
> >   _2935 = _376 * 2;
> >   _3966 = &arr_725 + _2935;
> >   _3024 = _61 >= _3966;
> >   _4219 = _3992 * 2;
> >   _4218 = &arr_725 + _4219;
> >   _1836 = _831 <= _4218;
> >   _3080 = _1836 | _3024;
> > <...>
> >   _5348 = _5347 & _32080;
> >   _5349 = _5348 & _32151;
> >   _5350 = _5349 & _32176;
> >   _5351 = _5350 & _32488;
> >   _5352 = _5351 & _33691;
> >   _5353 = _5352 & _33762;
> >   _5354 = _5353 & _34753;
> >   _35662 = _5354 & _34824;
> >   if (_35662 != 0)
> >     goto <bb 561>; [90.00%]
> >   else
> >     goto <bb 1510>; [10.00%]
> > 
> > Its a 7200 stmt basic block, made up of calculations and 2614 ORs and 1480
> > ANDs.
> > 
> > A request is made for a range which can be exported from this block, and
> > ranger is dutifully trying everything it can to process those blocks.
> > 
> >  Each AND/OR is a logical expression which evaluates a TRUE and FALSE range
> > for each operands, so it calculates up to 4 ranges for each pair of
> > operands. I knew this could get out of hand in pathological cases, so we
> > introduced a logical cache to help resolve this and avoid extra work.  Its
> > actually making this one worse I think.
> 
> Hmm, still the overall work should be linear to produce ranges for all
> of the SSA defs in this BB, no?  As heuristic you might want to avoid
> producing ranges for single-use defs, like those that are just used in
> another & or | combination?  Wrestrict should only be interested in
> ranges for the "tails" of this &| tree (for example _61 in _61 >= _3966).
> 

Since the direction is bottom up, it is no longer linear. This has probably
never been explain very well.  lets make up a simple example:

    if (x > 2 && x < 10 || x == 15)
for unsigned x turns into:

    _1 = x_8(D) + 4294967293;
    _2 = _1 <= 6;
    _3 = x_8(D) == 15;
    _4 = _2 | _3;
    if (_4 != 0)
      goto <bb 3>; [INV]
    else
      goto <bb 5>; [INV]

and we can calculate the following ranges (note none of them are calculated in
advance, only if asked/required) :

2->3  (T) _4 :  bool [1, 1]
2->3  (T) x_8(D) :      unsigned int [3, 9][15, 15]
2->5  (F) _1 :  unsigned int [7, +INF]
2->5  (F) _2 :  bool [0, 0]
2->5  (F) _3 :  bool [0, 0]
2->5  (F) _4 :  bool [0, 0]
2->5  (F) x_8(D) :      unsigned int [0, 2][10, 14][16, +INF]

When a client asks for the range of x_8 on the true edge, we have to solve
[1,1] = _4 != 0, which in turn feeds back to the def of _4 as:
[1,1] = _2 | _3

There are 3 possible ways this branch can be taken..
a) _2 = [1, 1], _3 = [1, 1]
b) _2 = [0, 0], _3 = [1, 1]
c) _2 = [1, 1], _3 = [0, 0]

In order to calculate a precise range for x_8, we need to calculate the range
of x_8 for both possible values of _2 and _3  and combine them.. 

I wont trace the actual calculation for each one, but it boils down to
_2 = [0, 0] produces x_8 = ~[3, 9]
_2 = [1, 1] produces x_8 = [3, 9]
_3 = [0, 0] produces x_8 = ~[15, 15]
_3 = [1, 1] produces x_8 = [15, 15]

Then we combine them with the 2 possible combinations, and produce the final
range of unsigned int [3, 9][15, 15].

Once upon a time I tried to "simplify" this a couple of different ways, but in
more complex situations, it inevitably fails to produce the correct range.. so
instead, we simply do the calculations exactly as the statement requires and
combine them.

The logical OR spawned 4 requests for the range of x_8.. so when these logical
expressions feed each other, we get the exponential growth of computations.

The logical cache was suppose to resolve this by caching the true and false
values of x_8 for _2 and _3 eliminating the need to recalculate them.   More
complex cases with many ssa_names feeding through a boolean condition cause it
to not function well.


As for single use-use defs.. There is nothing special about them. We never
produce ranges for anything that is not used an an outgoing edge calculation,
regardless of how many uses there are.  Those are tagged and we simply use
their global value.

Furthermore, we never produce ranges for anything that isn't either explicitly
requested, or used in a calculation that affects an explicit request.

In this case for instance, I forget the name that restrict asked for, but lets
say it was  _3992.  we start at the bottom of the block and work back to the
definition of _3992.  During the evaluation, we go through many single-use
cases which we will need the ranges for as they feed the condition at the
bottom and may therefore affect the outcome.  Anything above _3992's def is
never evaluated.

Up until now, I haven't really throttled anything.. Since we only calculate
ranges for things that are actually useful, that helps to compensate for the
time spent doing the computations.

We knew that the logical combination was potentially an issue, and
thought/hoped the cache would contain bad behaviour...  but not in this case.

I plan to focus more time in the next release trying to evaluate when a good
time to "give up" is, and maybe find additional efficiencies, but for now, just
limited the depth of logical evaluation should suffice.

[Bug tree-optimization/100081] [11 Regression] Compile time hog in irange since r11-4135-ge864d395b4e862ce

Reply via email to