Re: Project Ranger

Richard Biener Mon, 04 Jun 2018 08:22:12 -0700

On Mon, Jun 4, 2018 at 5:17 PM Richard Biener
<richard.guent...@gmail.com> wrote:>
> On Fri, Jun 1, 2018 at 10:38 PM Andrew MacLeod <amacl...@redhat.com> wrote:
> >
> > On 06/01/2018 05:48 AM, Richard Biener wrote:
> >
> > On Wed, May 30, 2018 at 1:53 AM Andrew MacLeod <amacl...@redhat.com> wrote:
>
> bah, gmail now completely mangles quoting :/


Ah no, you sent a multipart mail and it showed the html variant.  Replying in
plain-text mode messes things up then.  And your mail likely got rejected
by the list anyway.

> >
> > This allows queries for a range on an edge, on entry to a block, as an
> > operand on an specific statement, or to calculate the range of the
> > result of a statement.  There are no prerequisites to use it, simply
> > create a path ranger and start using the API.   There is even an
> > available function which can be lightly called and doesn't require
> > knowledge of the ranger:
> >
> > So what the wiki pages do not show is how contextual information
> > is handled (and how that's done w/o dominators as the wiki page
> > claims).  That is, path_range_on_edge suggests that range information
> > provided by the (all) controlling stmts of that edge are used when
> > determining the range for 'name'.
> >
> > the wiki doesn't talk about it, but the document itself goes into great 
> > detail of how the dynamic process works.
> >
> > That's probably true for the 'stmt' variant as well.
> >
> > This leads me to guess that either the cache has to be
> > O(number of SSA uses) size or a simple VRP pass implemented
> > using Ranger querying ranges of all SSA ops for each stmt
> > will require O(number of SSA uses) times analysis?
> >
> > The current cache is not particularly efficient, its size is  #ssa_name * 
> > #BB, assuming you actually go and look at every basic block, which many 
> > passes do not.  It also only puts a range in there when it has to.    Im 
> > not sure why it really matters unless we find a pathological case that 
> > kills compile time which cannot be addressed.  This would all fall under 
> > performance analysis and tweaking.  Its the end performance that matters.
> >
> > When a  range request is made for NAME in a block, it requests a range on 
> > each incoming edge, and then unions those ranges, and caches it. This is 
> > the range-on-entry cache.   The next time a range request for  NAME is made 
> > for on-entry to that block, it simply picks it up from the cache.
> >
> > The requests for range on an edge uses the gori_map summary to decide 
> > whether that block creates a range for NAME or not.
> > If the block does not generate a range, it simply requests the 
> > range-on-entry to that block.
> > If it does generate a range, then it examines the statements defining the 
> > condition, gets ranges for those,  and calculates the range that would be 
> > generated on that edge.  It also requests ranges for any of those names in 
> > the defining chain which originate outside this block.
> >
> > So it queries all the names which feed other names and so on. The queries 
> > end when a statement is found that doesn't generate a useful range. 
> > Sometimes it has to go hunt quite a way, but usually they are nearby.  And 
> > caching the range-on-entry values prevents the lookups from doing the same 
> > work over and over.
> >
> > Once a range has been calculated, we don't spend much more time calculating 
> > it again.   Any other ssa-name which uses that name also doesnt need to 
> > recalculate it.  For the most part, we walk the def chains for any given 
> > ssa-name and its feeding names once and future requests pretty much come 
> > from the cache.
> >
> > So yes, there is also a lot of recursion in here at the moment. calls for 
> > ranges spawning other calls before they can be resolved.  I suspect 
> > sometimes this call chain is deep. I don't know, we haven't performed any 
> > analysis on it yet.
> >
> > One of the advantages of a DOM-style or SSA-with-ASSERT_EXPR
> > pass is that the number of evaluations and the size of the cache
> > isn't that way "unbound".  On the WIKI page you mention edge
> > info isn't cached yet - whatever that means ;)
> >
> > So I wonder what's the speed for doing
> >
> > The on-entry caches prevent it from being "unbound". . and the intersection 
> > of incoming edge calculation performs the equivalent merging of ranges that 
> > DOM based processing give you.
> >
> > if (x > 10)
> >    A
> > else
> >    B
> > C
> >
> > block A has x [11, MAX]
> > block B has x [MIN, 10]
> > block C, at the bottom of a diamond, the 2 incoming edges union those 2 
> > ranges back to [MIN, MAX] and we know nothing about the range of X again.  
> > Accomplished the same thing dominators would tell you.
> >
> > The edge info that "isnt cached yet", (and may not ever be), is simply the 
> > 'static' info calculated by evaluating the few statements at the bottom of 
> > the block feeding the condition.  any values coming into the block are 
> > cached.    I this case its simply looking at x > 10 and calculating the 
> > range on whichever edge.    One of the original design elements called for 
> > this to be cached, but in practice I don't think it needs to be.  But we 
> > haven't analyzed the performance yet.. so it can only get quicker :-)
> >
> >
> >
> >
> >   FOR_EACH_BB_FN (bb)
> >      FOR_EACH_STMT (stmt)
> >        FOR_EACH_SSA_USE (stmt)
> >            path_range_on_stmt (..., SSA-USE, stmt);
> >
> > assuming that it takes into account controlling predicates.
> > That's actually what EVRP does (within a DOM walk) given
> > it tries to simplify each stmt with based on range info of the ops.
> >
> >
> >   FOR_EACH_BB_FN (bb)
> >      FOR_EACH_STMT (stmt)
> >         path_range_on_stmt (..., stmt);
> >
> > would be sufficient.  In order to calculate the range for the definition of 
> > the statement, the Ranger will query the range of each SSA operand.
> >
> > I had Aldy run his cumulative .ii  times with a patch added to EVRP which 
> > adds a ranger and makes a call to path_range_stmt on each PHI and every 
> > stmt as  evrp processes it.
> >
> > evrp original time: 2025055 usec
> > evrp + ranger:      3787678 usec
> >                     -------
> >       ranger time:  1762623
> >
> > So to be fair we aren't actually doing anything and evrp is, but the actual 
> > folding of statement is not a big time killer which is the only additional 
> > thing EVRP is doing.  This indicates that for a full-on stmt-by-stmt walk 
> > and process, we're certainly in the same time ballpark.
> >
> > A significant point is that the stmt-by-stmt walk is also not the intended 
> > target. Thats old school now :-)   Many clients only care about a few 
> > ranges.
> >
> > The "quick and dirty" RVRP pass only looks at the last condition in the 
> > block and catches the vast majority of what the stmt-by-stmt walk 
> > accomplishes. If timings for the RVRP pass are any indication, I would say 
> > it walks/calculates 60% of the statements in the program to calculate flow 
> > conditions.  Processing a lot of arithmetic statements that can never 
> > result in a fold seems kinda pointless. leave those to constant prop, 
> > (which could use the ranger as client).   The extra stmt-by-stmt 
> > opprtunities found mostly appear to find PHIs that can be folded and the 
> > odd condition that doesn't feed a branch.
> >
> > EVRP also *has* to walk each statement because that is the the model it 
> > uses to find ranges. build them from the top down. The ranger model does 
> > not have this need.   If uses of the ranger were more prolific in other 
> > passes, I think a lot of things the existing VRP passes get on these 
> > statements would be caught in the normal processing the other passes do.  
> > We did put a stmt-by-stmt RVRP on the todo list mostly so we could get a 
> > better apple-to-apple time comparison and to make sure we would eventually 
> > have full coverage of its VRP's abilities.
> >
> > I would also note again that we have made no attempt to optimize things 
> > yet. We don't really know how much extra work we're doing that doesn't need 
> > to be redone. We just know the current speed is a pretty good starting 
> > point.  We're seeing big improvements in the passes that are changed to use 
> > it.
> >
> > we will get to finding the inefficiencies and try to address them.
> >
> >
> >
> > More generally my remarks are still the same as with the irange introduction
> > attempt.
> >
> > It's a no-no to add yet another set of range op primitives given we already 
> > have
> > them implemented (with trees, yes...) in tre-vrp.c.  The existing ones
> > are already
> > factored out to some extent so I expect you will continue that and split out
> > the wide-int cases from those existing routines.  Yes - that will not be 
> > very
> > C++-ish - but who cares.  Simply implement C-style workers, for example
> > be inspired by the tree-ssa-ccp.c:bit_value_* routines implementing
> > a kind of non-zero-bits operation primitives ontop of wide_ints (and also
> > wrapping those with tree variants).
> >
> > Sharing of code to do the same work was covered as something we will get 
> > to. Aldy is working on that so there is not dual maintenance. I had started 
> > it with the wide-int-aux.[ch] file in the branch but never got very far.   
> > Yeah, its not super pretty, but it is what it is.
> >
> > It wont be perfect however as many of those routines are written with the 
> > assumption that ranges are either a single range or an anti range. we no 
> > longer have that restriction and we will diverge in those cases.
>
> But having 1 single range or N ranges isn't different when it comes to
> the core working routines.  Because if you have N ranges it's simply
> apply the op N times and union the result.  Yes - the special case is
> now that OP (range) can result in multiple ranges but currently
> all interesting cases you likely implement are simply covered by
> returning either a range or an anti-range.  Because more generally
> you'd not want N ranges but some affine thing so you can handle even
> more cases without blowing up storage.
>
> > Extricating the code from the symbolic handling is also not straightforward 
> > some times as there are big switches which mix and match various bits of 
> > each case.
>
> Yes.  Which is why I never really got far with dumping the symbolic
> stuff because it's hard to come up with a replacement.  I guess
> in reality you want to model symbolic ranges as constraints on affine
> expressions and then have some constraint solver to
> answer queries -- at least for symbolic ranges the interesting
> transforms involve comparison simplifications (aka jump threading)
> only.
>
> > We will however try to distill it down to the wide-int aspects that are 
> > common.
> >
> >  I would also argue that long term I see value_range disappearing 
> > completely... and then there wont be any dual maintenance :-)
> >
> >
> >
> > For most of your on-demand uses I would expect a _much_ simpler and
> > less heavy-weight approach to be extending the recently merged
> > determine_value_range (I've pointed you to this multiple times in the
> > past) to walk SSA def-use chains using the existing SSA on-the-side
> > info as cache (yes, w/o use-granular contextual information).
> >
> > I think what we have is pretty lightweight and more powerful than that 
> > would be.  I started from scratch so we could get all the ideas in one 
> > place working together and not suffer from limits imposed by existing 
> > assumptions or information not easily available.    I think our results 
> > confirm those benefits. Its fast, easy to use, and its already starting to 
> > get things EVRP does not get. It is also showing signs of finding things 
> > VRP itself has trouble with.  When we handle equivalencies I would expect 
> > we will exceed current capabilities.
> >
> >
> > I also think that purely restricting yourself to wide-ints is a mistake - we
> > do very well want to have range information for REAL_CSTs (or composite
> > integers / reals).  How do you think of extending Ranger to handle different
> > kind of types?  Eric already raised the issue of symbolic ranges.
> >
> > This is in fact one of the main reasons we wrote everything new, including 
> > a distinct range class. We expected to want to make this work on other 
> > types.
> >
> > Irange and range-ops is a class to handle integer ranges.  We have a very 
> > well defined API for ranges now.  The range engine itself simply uses this 
> > API.
> >
> > If we write class 'real_range' to manipulate & store real ranges and 
> > implement 'real_range_ops' to teach the compiler  how real ranges are 
> > extracted from expressions, the entire ranger engine will work on reals.
> >
> > There are 2 options to proceed with this
> >
> > a) we'd either template the ranger to use <class Range> instead of class 
> > irange, or at least extract out the common code and template that.  And 
> > then we'll have the same on-demand ranger  for "real" ranges, or "complex" 
> > or whatever.   so instead of invoking it with 'path_ranger r', it'd be 
> > something like 'path_ranger<irange> r',  or path_ranger<real_range> r;
> >
> > I like that less because I hate templates, but it is an option.  The more 
> > likely case I planned for is:
> >
> > b) adjust class irange to derive from a base range class with the API as 
> > virtual functions, and inherit irange and real_range from that  for the 
> > ranger to call. Then it would handle both simultaneously.  We'd have some 
> > tweaking to do where the ranger currently checks the type of things to be 
> > integral, but nothing too serious.  The entire structure of the range 
> > engine will still apply, and we could process both simultaneously.
> >
> > The second model was also intended from the beginning to also allow us to 
> > adjust the number of sub-ranges dynamically. Right now we set it to 3 
> > sub-ranges at compile time, but he intention is to allow this to vary if we 
> > want.  There are cases where some optimization might be willing to spend 
> > extra time/memory in order to get really accurate range info. Maybe switch 
> > analysis could use an irange  which allowed up to 1000 subranges so it 
> > could very precisely map what the value is at any given point.  The general 
> > case doesn't need that extra memory usage however.
> >
> > It just gives us flexibility we don't currently have.  We also expect to do 
> > some tests and find out what the "optimum" default number of ranges are.  I 
> > chose 3 because it was handling cases I wanted to get we couldn't get 
> > before.  Maybe a different number is better, we'd have to run some 
> > experiments to see how often we could use more ranges.
> >
> >
> >
> > Yes, I do like having wide-int workers for ranges.
> >
> > Yes, I do like the idea of making ranges not restricted to single sub-ranges
> > (didn't review the implementation).
> >
> > But those things should have been done on the existing code, not sticking
> > sth completely new into GCC that cannot subsume the existing stuff.  That's
> > the road to maintainance hell which we unfortunately walk way too often
> > with GCC :/
> >
> > I argue that this will subsume all the existing code related to it.   Sure 
> > we cannot right this moment, its not a small task. I think our pathway out 
> > of maintenance hell involves removing the existing value_range code and the 
> > various incarnations of VRP we have and using something modern designed 
> > from scratch to solve these problems.  I would also predict we could do in 
> > the release after this next one. So we'd have one release with the ranger 
> > and VRP brothers both present, and after that it would be just the Ranger.
> >
> > Btw, the original goal of EVRP was to get rid of the ASSERT_EXRP-style
> > VRP pass once the threading bits it has are subsumed by the backwards
> > threader.  Does ranger at least allow that - get rid of the threading inside
> > VRP without regressions?
> >
> > The ranger calculates ranges. It does not implement a complete VRP 
> > replacement.  We will implement a full VRP in time. One of the original 
> > goals was to get this working so we can remove all the micro-optimizations 
> > that VRP does because range info isn't available anywhere else.   The clean 
> > goal is all the threading related range work should be done in threading 
> > via calls to the ranger.
> >
> > So to more directly answer the question.   I do not know the state today.  
> > I'm not the threader dude :-)   Aldy enhanced the backward threader to do 
> > some things we couldn't do before, and converted those other 3 passes.  I 
> > think Jeff has on his plate to remove the threading parts from VRP and put 
> > them back in the threader.  So he can clarify this.   By the end of the 
> > stage I would expect that threader work to be complete and out of VRP.
> >
> >
> > Just as a final note - open source development doesn't work like this,
> > citing you "I'd like to introduce a project we've been working on for
> > the past year
> > an a half." - this simply provokes "reviews" like the above.
> >
> > well, it is what it is.   I see nothing wrong with the "review".   quite 
> > frankly, it's pretty much as expected.  I think it would have been almost 
> > identical if I'd talked about it last year.
> >
> > And why can't it can work like this?  We aren't suggesting we shove this 
> > into trunk right now.  We've been experimenting with the technology, and 
> > now we think its developed to the point where we think it can be of 
> > benefit, and we can demonstrate it works.  Its on a branch and if anyone 
> > else wants to help.  awesome.  We have a to do list.   It'll probably be 
> > months before we're even ready to be considered for a merge.  It has now 
> > advanced to the point where we have tangible things to talk about for the 
> > first time.
> >
> > Sure, we could replace irange with value_range and range-ops with some 
> > serious reconfiguration of the range extraction code, and the ranger would 
> > still work. It just needs the same range API.  I happen to think irange and 
> > range-ops offers benefits that value_range can't, and so to do that would 
> > be a step backwards.
> >
> > I chose to create irange so we had the flexibility to enhance its ability 
> > in the future while offering some improvements now.
> > Range-ops was a desire to unite the range processing code in one place, and 
> > I needed to add the ability to do algebraic re-arrangements that we 
> > currently dont do in VRP... the op1_irange and op2_irange code that derives 
> > ranges for operands given a LHS and the other operand.   Yeah, its not in 
> > one place yet, but it will be.
> >
> > So at what point do we decide community involvement is appropriate. As soon 
> > as I get the idea? Maybe after I had the third prototype sort-of working 
> > indicating the approach was feasible?  It was pretty ugly then.  I wouldn't 
> > want anyone to have seen that :-)    All the exact same questions would 
> > come up then as now.  We need to handle symbolics.  too expensive.  reuse 
> > the existing range code.  Rather than waving my hands to answer and not 
> > really accomplish anything, I can now point to tangible things. (mostly :-) 
> >  I would have still pushed for irange. and range-ops because I think they 
> > are the right strategic solution.     Maybe someone else could have helped 
> > advance things a little faster, but I was a pretty big bottle neck on 
> > design and reworking things. ask Aldy.
> >
> > The only thing that we ever had working that was even worth considering 
> > submitting was the irange work last July. and that did not get very far.   
> > We could also have waited another year until we have a complete VRP 
> > replacement working. Then its a drop in solution that removes a lot of the 
> > arguments about code sharing and not re-using value_range. That might have 
> > been even easier! I do  think its reached the point where there is a lot of 
> > benefit to be derived now, and so here it is, lets figure out what we can 
> > do with it.
> >
> > Anyway, the "lateness" of he introduction is purely on me.  I like to get 
> > things working before talking about them especially when they start with 
> > half baked ideas.
>
> I would have liked to see factoring out from current code first and
> actually enhancing the current value-range represenation by removing
> the
> restriction on single ranges.
>
>  I don't very much believe in the "start from scratch" approach.
> Because history shows we always end up with both, the old and the new
> afterwards.
>
> Richard.
>
> > I think this is a good strategic direction to go, demonstrates benefit now, 
> > and has a path to a better/easier to maintain VRP with a compiler-wide 
> > range resource.
> >
> > Andrew
> >
> >
> >
> >
> >
> >
> >

Re: Project Ranger

Reply via email to