On 4/19/21 3:40 AM, Jakub Jelinek wrote:
On Sun, Apr 18, 2021 at 08:11:21PM -0400, Andrew MacLeod via Gcc-patches wrote:
--- a/gcc/gimple-range-gori.cc
+++ b/gcc/gimple-range-gori.cc
@@ -29,6 +29,36 @@ along with GCC; see the file COPYING3.  If not see
  #include "gimple-pretty-print.h"
  #include "gimple-range.h"
+// Limit the nested depth thru logical expressions which GORI will build
+// def chains.
+#define LOGICAL_LIMIT  6
Such limits should be in params.def so that users can override them.

+// Return TRUE if GS is a logical && or || expression.
+
+static inline bool
+is_gimple_logical_p (const gimple *gs)
+{
+  // Look for boolean and/or condition.
+  if (gimple_code (gs) == GIMPLE_ASSIGN)
   if (is_gimple_assign (gs))

is the normal spelling of this check.

But more importantly, isn't 6 too low for logicals, and wouldn't it be
better to put a cap not on the number of seen logicals, but on how many
SSA_NAMEs are handled in a query?  Because it is easy to construct
testcases where the logical limit would not trigger but the operands
of comparisons used in logicals would be thousands of arithmetic/bitwise
statements etc.  And it isn't just artificial, we have various examples
of generated tests in bugzilla that typically can't be compiled in
reasonable time at -O2 and need to use -O1 and have huge basic blocks with
very deep chains.

The chains aren't typically  the issue, its a linear calculation.. I gave a small example in the PR of why the logical expressions cause an exponential growth.

Most values are only ever calculated once and are cached.  the only time we do this backward calculation is when a range is changed by the condition at the bottom of a block, and used outside the block and therefore the edge will have a different value than the range in the block.

I dont think 6 logicals is too low... in fact 4 is probably plenty 99.9999% of the time.  This is not during a forward walk...  we will walk thru every logical forward and calculate ranges as appropriate.. This is only when trying to determine a refined range on an edge. so for example

<bb 2> [local count: 1073741823]:
  _1 = x_20(D) == 10;
  _2 = y_21(D) == 10;
  _3 = _1 & _2;
  _4 = x_20(D) == 20;
  _5 = y_21(D) == 20;
  _6 = _4 & _5;
  _7 = x_20(D) == 30;
  _8 = y_21(D) == 30;
  _9 = _7 & _8;
  _10 = x_20(D) == 40;
  _11 = y_21(D) == 40;
  _12 = _10 & _11;
  _13 = x_20(D) == 50;
  _14 = y_21(D) == 50;
  _15 = _13 & _14;
  _16 = x_20(D) == 60;
  _17 = y_21(D) == 60;
  _18 = _16 & _17;
  _19 = _3 | _6       // depth 3
  _20 = _9 | _12     // depth 3
  _21 = _15 | _18    // depth 2
  _22 = _19 | _20    // depth 2
  _23 = _21 | _22    // depth 1

  if (_23 != 0)
    goto <bb 8>; [50.00%]
  else
    goto <bb 9>; [50.00%]

We can still look back thru these conditions and could find a range for x_20 or y_21 of [10,10][20,20][30,30][40,40],[50,50][60,60] because the logical nesting depth never exceeds 3 when walking from back from the branch to the use of x_20 or y_21.

we still evaluate everything walking forward.. ie we get global ranges for everything in the block and evaluate the final condition based on the block values.

If we added a whole bunch more logicals, we'd still do that forward processing in EVRP.  This change would then say, you know, its not worth trying to calculate back thru that many logical expressions, so we'll just say x_20 and y_21 are whatever their range on entry to the block was  rather than trying to refine it on an outgoing edge. so we would get less rpecise ranges for x_20 and y_21 on these edges then.

Sure, a similar argument can be made that if we go through a long list of ssa_names in a straight calculation, the odds of getting a refined range decrease... but for example:

a_2 = a_1 + 1
a_3 = a_2 + 1
<...>
a_44 = a_43 + 1
if (a_44 > 100)
  goto <bb 7>
<bb 7>
  if (a_1 > 50)

asking for the range of a_1 in bb_7 causes a lookup to say "a_1 is an export and can be calculated on the edge 2->7", what is it?  We will walk back from a_44 > 100, evaluating those names based on the edge value of a_44  [101, +INF]  until we get to  a_2 = a_1 + 1...  a that point, a_2 will have an edge range of [58, +INF - 42]  or something to that effect, and it will return the evaluated a_2 range on the edge as [57, +INF - 43] , and we can fold the stmt.

We only do that once. bb_7 will have the value of a_1 cached since it was explicitly requested.    These really long back calculations don't really happen that often, so I havent seen the need to throttle them.  We also don't cache the intervening values (a_3 thru a_44) ... Odds are they don't get asked for outside the block, so it would just be a memory waste.. we only ever cache whats actually asked for.

So a lot depends on whats actually requested.. most of the time, these really long blocks do not have many, if any, uses outside the block.. in which case we will never even do a wind back range request.  IF we find such cases, then maybe a limit is appropriate.. It may also be that we can find good heuristics for giving up early if we can detect that the calculation isnt going anything... a topic of research for stage 1.


One thing is evrp/vrp passes where we want to compute ranges of all
SSA_NAMEs and if caching is done, compute range of everything at most once
(or with some bounded iteration counts for loops or whatever ranger uses).

Everything is only ever calculated once, EXCEPT when a condition at the bottom affects the value on and edge, and we only go back and re-calculate those when requested from other blocks.

in the above case, if we didnt ask for the range of a_1 in bb7...  we wouldn't calculate the range of anything on those edges.. we'd just use the global ranges. We only do work which is actually required... and the logical expression evaluation is the only thing in ranger which is currently not linear.  I thought the logical cache would be sufficient, but apparently not in sufficiently complex cases.


  changes

And another thing is where passes ask for ranges, perhaps just those should
use the limit or should use lower limit.

        

so again in summary, this limit doesn't affect the calculation of global ranges at all, it only limits how far we will look back thru logical expressions to try to refine values on edges when they can be recalculated... just to eliminate the exponential behaviour. and I would daresay every one of them we get this way is a range we wouldn't have gotten any other way anyway.

I may not have explained this great...  its the most complex part of how ranger works and is at the heart of the on-demand system.

Andrew

Reply via email to