Re: On-Demand range technology [2/5] - Major Components : How it works

Richard Biener Wed, 29 May 2019 04:16:34 -0700

On Tue, May 28, 2019 at 4:17 PM Andrew MacLeod <[email protected]> wrote:
>
> On 5/27/19 9:02 AM, Richard Biener wrote:
> > On Fri, May 24, 2019 at 5:50 PM Andrew MacLeod <[email protected]> wrote:
> >>
> >>> The above suggests that iff this is done at all it is not in GORI because
> >>> those are not conditional stmts or ranges from feeding those.  The
> >>> machinery doing the use-def walking from stmt context also cannot
> >>> come along these so I have the suspicion that Ranger cannot handle
> >>> telling us that for the stmt following above, for example
> >>>
> >>>    if (_5 != 0)
> >>>
> >>> that _5 is not zero?
> >>>
> >>> Can you clarify?
> >> So there are 2 aspects to this.    the range-ops code for DIV_EXPR, if
> >> asked for the range of op2 () would return ~[0,0] for _5.
> >> But you are also correct in that the walk backwards would not find this.
> >>
> >> This is similar functionality to how null_derefs are currently handled,
> >> and in fact could probably be done simultaneously using the same code
> >> base.   I didn't bring null derefs up, but this is a good time :-)
> >>
> >> There is a separate class used by the gori-cache which tracks the
> >> non-nullness property at the block level.    It has a single API:
> >> non_null_deref_p (name, bb)    which determines whether the is a
> >> dereference in any BB for NAME, which indicates whether the range has an
> >> implicit ~[0,0] range in that basic block or not.
> > So when we then have
> >
> >   _1 = *_2; // after this _2 is non-NULL
> >   _3 = _1 + 1; // _3 is non-NULL
> >   _4 = *_3;
> > ...
> >
> > when a on-demand user asks whether _3 is non-NULL at the
> > point of _4 = *_3 we don't have this information?  Since the
> > per-BB caching will only say _1 is non-NULL after the BB.
> > I'm also not sure whether _3 ever gets non-NULL during
> > non-NULL processing of the block since walking immediate uses
> > doesn't really help here?
> presumably _3 is globally non-null due to the definition being (pointer
> + x)  ... ie, _3 has a global range o f ~[0,0] ?


No, _3 is ~[0, 0] because it is derived from _1 which is ~[0, 0] and
you cannot arrive at NULL by pointer arithmetic from a non-NULL pointer.

> >
> > So this seems to be a fundamental limitation [to the caching scheme],
> > not sure if it is bad in practice.
> >
> > Or am I missing something?
> >
>
> Not missing anything  The non-nullness property is maintains globally at
> the basic block level.  both _1 and _3 are flagged as being non-null in
> the block.  Upon exit, its a bit check.   If the global information does
> not have the non-nullness property, then when a request is made for
> non-nullness  and the def and the use are both within the same block,
> and its flagged as being non-null in that block, then the request is
> forced back to a quick walk between the def and the use to see if there
> is any non-nulless introduced in between.   Yes, that makes it a linear
> walk, but its infrequent, and often short.  to the best of our knowledge
> at this point anyway :-)

So with the clarification above do we ever see that _3 is non-NULL?
I suppose the worker processing _3 = _1 + 1 would ask for
_1 non-nullness but we do not record any non-NULL-ness of _1 in
this basic-block (but only at its end).  Consider stmts

 _4 = (uintptr_t) _2;
 _5 = _6 / _4;
 _1 = *_2;
...

here at _1 we know _2 is not NULL.  But if we ask for non-NULLness
of _2 at the definition of _4 we may not compute ~[0, 0] and thus
conclude that _6 / _4 does not trap.

stmt-level tracking of ranges are sometimes important.  This is
something the machinery cannot provide - correct?  At least not
optimistically enough with ranges derived about uses.

> >>
> >> yes, compile-time complexity is from empirical speed timings and
> >> theory-crafting from the algorithms,  and that the on-entry cache
> >> prevents multiple passes over things.
> >>
> >> we have not done a memory analysis yet, not done anything to see if we
> >> can improve it.
> >> It makes very heavy use of bitmaps, which are typically quite sparse.
> >> The on-entry cache is a vector of pointers to a range, initially 0, and
> >> we allocate ranges as needed.   There will be an on-entry range entry
> >> for any ssa-name which has been queried between the query point and the
> >> definition.
> > So that's similar to LIVE-on-entry thus a SSA name with a range
> > will have an on-entry range entry on each BB it dominates?
> > That makes storage requirement quadratic in the worst case.
> Yes, assuming a request has been made for all ssa-names everywhere they
> are live.

You did propose to replace [E]VRP with a walk over the whole function
querying ranges and simplifying stmts, did you?

> >>   My belief is this is typically not large, but we have done
> >> no analysis as yet.  This does mean that once an ssa-name goes
> >> "out-of-scope", ie no  more uses in the program, it will not be in the
> >> cache for any of those blocks simply because its never queried past its
> >> end of life.
> >>> Not sure where to put it so let me put it here:  I'd like to see the SSA
> >>> based VRP pass to die (in fact I'd like all ssa-propagator based
> >>> passes to die...).
> >>> EVRP is the future here and I can see EVRP making use of the
> >>> reverse part of Range-Ops to replace the ad-hoc pattern matching
> >>> in register_edge_assert_for (badly named function that now gets you
> >>> back a list of assertions hold on an edge).  So I hope to see VRP
> >> Does register_edge_assert cache anything? or walk an assert chain? or is
> >> it basically just a "range-on-this-edge"  call? and the result combined
> >> with what is known at that point?
> > register_edge_assert pattern-matches the stmts feeding a conditional
> > stmt on an edge.  It doesn't cache anything (so does quite some redundant
> > work when processing all outgoing edges).  It's not intended to be called
> > multiple times but rather when the DOM walker enters a block and we
> > look at the controlling statement to derive conditional ranges.  For
> > example for
> >
> >   _1 = (unsigned) _2;
> >   if (_1 > 1)
> >    ...
> >
> > it derives a vector of two relations on the true edge:
> >
> >   _1 > 1 == true
> >   (unsigned) _2 > 1 == true
> >
> > this is actually what we end up building ASSERT_EXPRs for in traditional
> > VRP and what we extract ranges from only valid in dominated context from 
> > EVRP.
>
> and that gets mapped to the actual range [2, MAX] when?  That is what
> range_on_edge will produce for both  _1 and _2 on this edge.

In EVRP immediately.  In VRP when the later propagation stage visits the
newly inserted ASSERT_EXPRs.

> If we were to go to the effort of handling symbolics, and the expression was
>    if (_1 > _2)
> then we'd get ranges _2 = [_1 + 1, MAX] and _1 = [MIN, _2]  from
> range_on_edge
>
>
> Andrew

Re: On-Demand range technology [2/5] - Major Components : How it works

Reply via email to