On 4/19/21 3:40 AM, Jakub Jelinek wrote:
On Sun, Apr 18, 2021 at 08:11:21PM -0400, Andrew MacLeod via Gcc-patches wrote:
--- a/gcc/gimple-range-gori.cc
+++ b/gcc/gimple-range-gori.cc
@@ -29,6 +29,36 @@ along with GCC; see the file COPYING3. If not see
#include "gimple-pretty-print.h"
#include "gimple-range.h"
+// Limit the nested depth thru logical expressions which GORI will build
+// def chains.
+#define LOGICAL_LIMIT 6
Such limits should be in params.def so that users can override them.
+// Return TRUE if GS is a logical && or || expression.
+
+static inline bool
+is_gimple_logical_p (const gimple *gs)
+{
+ // Look for boolean and/or condition.
+ if (gimple_code (gs) == GIMPLE_ASSIGN)
if (is_gimple_assign (gs))
is the normal spelling of this check.
But more importantly, isn't 6 too low for logicals, and wouldn't it be
better to put a cap not on the number of seen logicals, but on how many
SSA_NAMEs are handled in a query? Because it is easy to construct
testcases where the logical limit would not trigger but the operands
of comparisons used in logicals would be thousands of arithmetic/bitwise
statements etc. And it isn't just artificial, we have various examples
of generated tests in bugzilla that typically can't be compiled in
reasonable time at -O2 and need to use -O1 and have huge basic blocks with
very deep chains.
The chains aren't typically the issue, its a linear calculation.. I
gave a small example in the PR of why the logical expressions cause an
exponential growth.
Most values are only ever calculated once and are cached. the only time
we do this backward calculation is when a range is changed by the
condition at the bottom of a block, and used outside the block and
therefore the edge will have a different value than the range in the block.
I dont think 6 logicals is too low... in fact 4 is probably plenty
99.9999% of the time. This is not during a forward walk... we will
walk thru every logical forward and calculate ranges as appropriate..
This is only when trying to determine a refined range on an edge. so for
example
<bb 2> [local count: 1073741823]:
_1 = x_20(D) == 10;
_2 = y_21(D) == 10;
_3 = _1 & _2;
_4 = x_20(D) == 20;
_5 = y_21(D) == 20;
_6 = _4 & _5;
_7 = x_20(D) == 30;
_8 = y_21(D) == 30;
_9 = _7 & _8;
_10 = x_20(D) == 40;
_11 = y_21(D) == 40;
_12 = _10 & _11;
_13 = x_20(D) == 50;
_14 = y_21(D) == 50;
_15 = _13 & _14;
_16 = x_20(D) == 60;
_17 = y_21(D) == 60;
_18 = _16 & _17;
_19 = _3 | _6 // depth 3
_20 = _9 | _12 // depth 3
_21 = _15 | _18 // depth 2
_22 = _19 | _20 // depth 2
_23 = _21 | _22 // depth 1
if (_23 != 0)
goto <bb 8>; [50.00%]
else
goto <bb 9>; [50.00%]
We can still look back thru these conditions and could find a range for
x_20 or y_21 of [10,10][20,20][30,30][40,40],[50,50][60,60] because the
logical nesting depth never exceeds 3 when walking from back from the
branch to the use of x_20 or y_21.
we still evaluate everything walking forward.. ie we get global ranges
for everything in the block and evaluate the final condition based on
the block values.
If we added a whole bunch more logicals, we'd still do that forward
processing in EVRP. This change would then say, you know, its not worth
trying to calculate back thru that many logical expressions, so we'll
just say x_20 and y_21 are whatever their range on entry to the block
was rather than trying to refine it on an outgoing edge. so we would
get less rpecise ranges for x_20 and y_21 on these edges then.
Sure, a similar argument can be made that if we go through a long list
of ssa_names in a straight calculation, the odds of getting a refined
range decrease... but for example:
a_2 = a_1 + 1
a_3 = a_2 + 1
<...>
a_44 = a_43 + 1
if (a_44 > 100)
goto <bb 7>
<bb 7>
if (a_1 > 50)
asking for the range of a_1 in bb_7 causes a lookup to say "a_1 is an
export and can be calculated on the edge 2->7", what is it? We will
walk back from a_44 > 100, evaluating those names based on the edge
value of a_44 [101, +INF] until we get to a_2 = a_1 + 1... a that
point, a_2 will have an edge range of [58, +INF - 42] or something to
that effect, and it will return the evaluated a_2 range on the edge as
[57, +INF - 43] , and we can fold the stmt.
We only do that once. bb_7 will have the value of a_1 cached since it
was explicitly requested. These really long back calculations don't
really happen that often, so I havent seen the need to throttle them.
We also don't cache the intervening values (a_3 thru a_44) ... Odds are
they don't get asked for outside the block, so it would just be a memory
waste.. we only ever cache whats actually asked for.
So a lot depends on whats actually requested.. most of the time, these
really long blocks do not have many, if any, uses outside the block.. in
which case we will never even do a wind back range request. IF we find
such cases, then maybe a limit is appropriate.. It may also be that we
can find good heuristics for giving up early if we can detect that the
calculation isnt going anything... a topic of research for stage 1.
One thing is evrp/vrp passes where we want to compute ranges of all
SSA_NAMEs and if caching is done, compute range of everything at most once
(or with some bounded iteration counts for loops or whatever ranger uses).
Everything is only ever calculated once, EXCEPT when a condition at the
bottom affects the value on and edge, and we only go back and
re-calculate those when requested from other blocks.
in the above case, if we didnt ask for the range of a_1 in bb7... we
wouldn't calculate the range of anything on those edges.. we'd just use
the global ranges. We only do work which is actually required... and the
logical expression evaluation is the only thing in ranger which is
currently not linear. I thought the logical cache would be sufficient,
but apparently not in sufficiently complex cases.
changes
And another thing is where passes ask for ranges, perhaps just those should
use the limit or should use lower limit.
so again in summary, this limit doesn't affect the calculation of global
ranges at all, it only limits how far we will look back thru logical
expressions to try to refine values on edges when they can be
recalculated... just to eliminate the exponential behaviour. and I would
daresay every one of them we get this way is a range we wouldn't have
gotten any other way anyway.
I may not have explained this great... its the most complex part of how
ranger works and is at the heart of the on-demand system.
Andrew