On 11/19/21 04:21, Richard Biener wrote:
On Thu, Nov 18, 2021 at 8:30 PM Andrew MacLeod via Gcc-patches
<gcc-patches@gcc.gnu.org> wrote:
At issue here is the dynamic approach we currently use for outgoing edge
calculations. It isn't normally a problem, but once you get a very
large number of possible outgoing values (ie very long unrolled blocks)
with pairs of values on a statement, and individual queries for each
one, it becomes exponential if they relate to each other.
So. We previously introduced a param for PR 100081 which limited the
depth of logical expressions GORI would pursue because we always make 2
or 4 queries for each logical expression, which accelerated the
exponential behaviour much quicker.
This patch reuses that to apply to any statement which has 2 ssa-names,
as the potential for 2 queries can happen then as well.. leading to
exponential behaviour. Each statement we step back through with
multiple ssa-names decreases the odds of calculating a useful outgoing
range anyway. This will remove ridiculous behaviour like this PR
demonstrates.
Next release I plan to revamp GORI to cache outgoing values, which will
eliminate all this sort of behaviour, and we can remove all such
restrictions.
Bootstrapped on x86_64-pc-linux-gnu with no regressions. OK for trunk?
I think it's reasonable. SCEV tries to limit the overall number of SSA -> def
visits, capturing deep vs. wide in a more uniform (and single) way.
(--param scev-max-expr-size and the duplicate - huh - --param
scev-max-expr-complexity).
For SCEV running into the limit means fatal fail of the analysis, for ranger
it only means less refinement? So it might make a difference which path
you visit and which not (just thinking about reproducability of range analysis
when we say, swap operands of a stmt)?
Yes, Its means less refinement for "deeper" dependencies in the block..
It will not be affected by operand swapping because we will either look
at dependencies for both operands of a stmt, or neither. The path you
visit shouldn't make any difference. Note this has nothing to do with
generating ranges for those statements, this is only for refining
outgoing ranges created by the condition at the bottom of the block.
To be precise, we will stop looking for possible range changes once our
dependency search, which always starts from the bottom of the block,
encounters 6 stmts with multiple SSA operands. Statements with a single
SSA operand will continue to build the dependency chains.
so for instance, following just the chain for first operands of this
somewhat truncated listing:
<...>
_137 = (short int) j_131;
_138 = _129 & _137;
_139 = (int) _138;
j_140 = j_131 & _139; << depth 6 ... we don't go look at j_131
or _139 for further dependencies
_146 = (short int) j_140;
_147 = _138 & _146;
_148 = (int) _147;
j_149 = j_140 & _148; << depth 5
_155 = (short int) j_149;
_156 = _147 & _155;
_157 = (int) _156;
j_158 = j_149 & _157; << depth 4
_164 = (short int) j_158;
_165 = _156 & _164;
_166 = (int) _165;
j_167 = j_158 & _166; <<-- depth 3
_1 = (short int) j_167;
_3 = _1 & _165; <<-depth 2
_5 = (int) _3;
j_22 = _5 & j_167; <<-- depth 1
j_20 = j_22 + 4;
if (j_20 == 4)
goto <bb 3>;
We'll generate dependencies, and thus able to generate outgoing ranges
for all these ssa-names still until we hit a dependency depth of 6 (the
current default), which for one chain comes to an end at
j_140 = j_131 & _139
We will be able to generate outgoing ranges for j_131 and _139 on the
edges, but we will not try for any ssa_names used to define those.. ie,
_138 might not be able to have outgoing ranges generated for it in this
block.
working the j_20 : [4,4] range on the true edge back thru those
expressions, once we get that deep the odds of finding something of
value is remote :-)
We will also only generate these outgoing ranges when they are asked
for. Many of these names are not used outside the block (especially
back the deeper you go), and so never get asked for anyway... but you
could :-)
Anyway, this limit in no way introduces any kind of ordering variation..
Andrew
PS for amusement and information overload... it turns out in this hunk
of code we end up knowing that globally the range of most of these names
are [0, 4], and the actual ranges we COULD generate if asked:
3->5 (T) _1 : short int [0, 4]
3->5 (T) _3 : short int [0, 4]
3->5 (T) _5 : int [0, 4]
3->5 (T) j_20 : int [4, 4]
3->5 (T) j_22 : int [0, 0]
3->5 (T) _120 : short int [0, 4]
3->5 (T) _121 : int [0, 4]
3->5 (T) _128 : short int [0, 4]
3->5 (T) _129 : short int [0, 4]
3->5 (T) _130 : int [0, 4]
3->5 (T) j_131 : int [0, 4]
3->5 (T) _137 : short int [0, 4]
3->5 (T) _138 : short int [0, 4]
3->5 (T) _139 : int [0, 4]
3->5 (T) j_140 : int [0, 4]
3->5 (T) _146 : short int [0, 4]
3->5 (T) _147 : short int [0, 4]
3->5 (T) _148 : int [0, 4]
3->5 (T) j_149 : int [0, 4]
3->5 (T) _155 : short int [0, 4]
3->5 (T) _156 : short int [0, 4]
3->5 (T) _157 : int [0, 4]
3->5 (T) j_158 : int [0, 4]
3->5 (T) _164 : short int [0, 4]
3->5 (T) _165 : short int [0, 4]
3->5 (T) _166 : int [0, 4]
3->5 (T) j_167 : int [0, 4]
3->4 (F) _1 : short int [1, 4]
3->4 (F) _3 : short int [1, 4]
3->4 (F) _5 : int [1, 4]
3->4 (F) j_20 : int [5, 8]
3->4 (F) j_22 : int [1, 4]
3->4 (F) _120 : short int [1, 4]
3->4 (F) _121 : int [1, 4]
3->4 (F) _128 : short int [1, 4]
3->4 (F) _129 : short int [1, 4]
3->4 (F) _130 : int [1, 4]
3->4 (F) j_131 : int [1, 4]
3->4 (F) _137 : short int [1, 4]
3->4 (F) _138 : short int [1, 4]
3->4 (F) _139 : int [1, 4]
3->4 (F) j_140 : int [1, 4]
3->4 (F) _146 : short int [1, 4]
3->4 (F) _147 : short int [1, 4]
3->4 (F) _148 : int [1, 4]
3->4 (F) j_149 : int [1, 4]
3->4 (F) _155 : short int [1, 4]
3->4 (F) _156 : short int [1, 4]
3->4 (F) _157 : int [1, 4]
3->4 (F) j_158 : int [1, 4]
3->4 (F) _164 : short int [1, 4]
3->4 (F) _165 : short int [1, 4]
3->4 (F) _166 : int [1, 4]
3->4 (F) j_167 : int [1, 4]
So the chains go slightly deeper than my example, but the end result is
most of the names in the _30 thru _119 range on the false edge would
come back with [0,4] instead of [1,4] with this change.
The actual dependency chains from the dump:
Exports: _1 _3 _5 j_20 j_22 _120 _128 _129 j_131 _137 _138
_139 j_140 _146 _147 _148 j_149 _155 _156 _157 j_158 _164
_165 _166 j_167
The exports are the names that ranger can potentially generate a range
on outgoing edges that may differ from the range in the block. It means
GORI might be able to take the condition at the bottom and adjust those
ranges. Any name not in the list will simply use the range it has in
the block. Adjusting that param value will change the number of these.
The dependency list for each ssa name is also shown: and we can see
how the higher in the block we go, the deeper in the dependency chain
from the bottom of the block we are, and thus there are less names in
the list for each name
_1 : _120 _128 _129 j_131 _137 _138 _139 j_140 _146
_147 _148 j_149 _155 _156 _157 j_158 _164 _165 _166 j_167
_3 : _1 _120 _128 _129 j_131 _137 _138 _139 j_140
_146 _147 _148 j_149 _155 _156 _157 j_158 _164 _165 _166 j_167
_5 : _1 _3 _120 _128 _129 j_131 _137 _138 _139 j_140
_146 _147 _148 j_149 _155 _156 _157 j_158 _164 _165 _166 j_167
j_20 : _1 _3 _5 j_22 _120 _128 _129 j_131 _137 _138
_139 j_140 _146 _147 _148 j_149 _155 _156 _157 j_158 _164
_165 _166 j_167
j_22 : _1 _3 _5 _120 _128 _129 j_131 _137 _138 _139
j_140 _146 _147 _148 j_149 _155 _156 _157 j_158 _164 _165
_166 j_167
_129 : _120 _128
_137 : j_131
_138 : _120 _128 _129 j_131 _137
j_140 : j_131 _139
_146 : j_131 _139 j_140
_147 : _120 _128 _129 j_131 _137 _138 _139 j_140 _146
_148 : _147
j_149 : j_131 _139 j_140 _147 _148
_155 : j_131 _139 j_140 _147 _148 j_149
_156 : _120 _128 _129 j_131 _137 _138 _139 j_140 _146
_147 _148 j_149 _155
_157 : _120 _128 _129 j_131 _137 _138 _139 j_140 _146
_147 _148 j_149 _155 _156
j_158 : _120 _128 _129 j_131 _137 _138 _139 j_140 _146
_147 _148 j_149 _155 _156 _157
_164 : _120 _128 _129 j_131 _137 _138 _139 j_140 _146
_147 _148 j_149 _155 _156 _157 j_158
_165 : _120 _128 _129 j_131 _137 _138 _139 j_140 _146
_147 _148 j_149 _155 _156 _157 j_158 _164
_166 : _120 _128 _129 j_131 _137 _138 _139 j_140 _146
_147 _148 j_149 _155 _156 _157 j_158 _164 _165
j_167 : _120 _128 _129 j_131 _137 _138 _139 j_140 _146
_147 _148 j_149 _155 _156 _157 j_158 _164 _165 _166