On 6/4/19 1:07 PM, Richard Biener wrote:
On June 4, 2019 6:50:07 PM GMT+02:00, Andrew MacLeod <amacl...@redhat.com>
wrote:
On 6/4/19 11:37 AM, Richard Biener wrote:
where the single(!) query of the range of i on the alloca call
will populate N BBs caches with the Nth having ranges for
all SSA defs of i running into quadraticness in N.
well, not really. its still linear since the 'i' used in each foo()
will be dead, so the next block will contain no range for it
try{
long i_2 = _1;
i_3 += foo(i_2);
i_4 += foo(i_3);
i_5 += foo(i_4);
i_6 += foo(i_5);
... repeat N times ...
alloca(i_6); // query range of i
once i_2 is 'dead' after the first call to foo(), it will no longer be
in the on-entry cache to any other block.
The walk back never see's i_2 until the first call to foo(), so none of
It's actually
Utemp_2 = foo(i_1);
Bb:
i_3 = i_1 + (long) utemp_2;
Utemp_4 = foo(i_3);
Eliding the separate stmt for the conversion. From you description of the cache
it sounded like getting incoming values is a take-all operation. Otherwise
you'd need negative entries as well (I thought the cache might not contain
varying ranges for example).
ok, Utemp2 and i_1 will be live, so there are 2 ranges on entry . the
next block will have only utemp_4 and i_3 live on entry. So it'll be 2
ranges per block.
It currently does have varying in the table, but the varying value is
cached for the entire ssa_name in the cache. so 500 BB's with varying
range will only consume a single range.
By negative entries I assume you mean something like the bottom of a
diamond?
if (x_3 > 20)
blah1 (x_3)
else
blah 2(x)_3;
blah3 (x_3);
It does nothing special to come up with the range of x_3 at blah3 ().
It simply unions all the incoming ranges which is [21,MAX] union
[MIN,20] to produce the value [MIN, MAX] which is then put into the
cache. We don't use anything special to represent varying.
varying_p() merely checks if the range is [MIN, MAX]. And when putting
a value into the cache, if it is varying_p() the cache simply uses it's
unique copy of [MIN, MAX] rather than create a new one every time it is
needed.
I had originally considered not caching ranges spanning blocks in which
the range never changes... the trade off is you have an increase in
calculation time as you walk the CFG to find the last block in which it
did change. First I figured we'd first see if the less error prone full
cache causes any real problems or not. Its always an option to
implement later since the cache is self contained and can resolve its
queries internally however it wants.
blocks after that have a need for its range, so it will not in their
caches.
The on-entry cache will only get a single range entry for each of those
blocks.. the incoming value of i_x and thats it. The implicitly known
live-on-exit attribute is handy here.
Now, that's not to say we cant construct cases where there is a lot of
ranges being populated.. If we find the cache is really a problem, we
can start caching ranges.. so that if all these i_x's were live
somehow,
there would only be one range for each in this case, and the cache's
would simply all point to the same range.
I suppose using i in the catch block would create sth like this, a phi with a
large in degree and thus the switch case you already know about?
It would be a PHI. None of the ranges coming into the block are
considered live on entry, so they dont get into the cache. The PHI
calculation asks for the incoming range on the edge for each parameter,
and we get a global range for the PHI definition calculated, and that is
it.
However, Thats not actually the switch problem :-) the switch problem
isn't due to a large degree of incoming edges, but rather the difficulty
in calculating case ranges on the outgoing edges from the switch itself.
stepping back slightly, Branches are handled generically the same way
any expression is, the outgoing edge you want information about provides
a constant range for the implicit LHS of the branch.
so for an if
c_2 = x_3 > 20
if (c_2 != 0)
the TRUE edge has a range of bool [1,1] , fed back as the implicit LHS
of the branch expression
[1,1] = (c_2 != 0) which range-ops for '!=' says c_2 must be [1,1] in
order to satisfy the equation.
[1,1] = x_3 > 20 which range-ops for '>' says x_3 must have a range of
[21, MAX]
if you want the range on the FALSE edge, the edge starts with the range
[0,0] into the equation solver, and out pops [MIN, 20].
switch (i_3)
for a switch, it works the same way, except the edge has an integral
value, not a boolean one. This range is fed back into the implicit
branch expression:
[edge range] = i_3 , so i_3 has whatever value the edge has since
it amounts to a copy. Calculating this edge constant value is the problem.
BB2:
switch (i_3) {
BB3:
case 4:
case 5:
case 9:
case 10:
foo (i_3)
In order to evaluate the case values on the edge from BB2 to BB3 as
[4,5][9,10], we have to loop over all the cases that have a label at
the beginning of the block, and union them together. Calculating the
default value is even worse, start with varying and intersect out each
and every case.
The gimple representation does not make evaluating this easy nor cheap
since it merely has a list of case LOW/HIGHS with labels... so you have
to loop thru that entire list for each edge being evaluated to see if
the label is associated with this edge and then union together all those
that match. This is linear and cant be shortcut. multiple it by the
number of edges since you probably want a range on each edge and its
suddenly exponential. The result is it can be *very* time consuming
if the switch is very, very large.
And it's a constant value that never changes, so it really could be
cheap. We've just never needed to ask this exact question before. We've
considered multiple ways to address this, but we wont worry about any
of them just yet. All in time :-)
so far we havent really run across anything that appears to trigger
significant concerns.
Sure. It's still worth thinking about worst case behavior and how you'd deal
with that given once ranger is actually used we will eventually run into these
cases. When
A forward evrp walk is then faster than a single on demand query that would be
concerning.
Richard.
Absolutely. I've been concerned about worst case since the get go.
Thats how its evolved to this design. We're trying to make sure the
worst case scenario is no worse than doing a normal walk.
We can currently run the ranger version of RVRP across blocks in
dominator order, reverse dominator order, linearly BB1 -> BBN, and
finally BBN-> BB1 There are fluctuations in timings, but nothing of
too much significance. They all run in similar times and pass all the
tests.
I have thoughts on how the various pieces can fit together with EVRP,
I'll stick that in another email.
Andrew