On Mon, Jun 4, 2018 at 5:17 PM Richard Biener <richard.guent...@gmail.com> wrote:> > On Fri, Jun 1, 2018 at 10:38 PM Andrew MacLeod <amacl...@redhat.com> wrote: > > > > On 06/01/2018 05:48 AM, Richard Biener wrote: > > > > On Wed, May 30, 2018 at 1:53 AM Andrew MacLeod <amacl...@redhat.com> wrote: > > bah, gmail now completely mangles quoting :/
Ah no, you sent a multipart mail and it showed the html variant. Replying in plain-text mode messes things up then. And your mail likely got rejected by the list anyway. > > > > This allows queries for a range on an edge, on entry to a block, as an > > operand on an specific statement, or to calculate the range of the > > result of a statement. There are no prerequisites to use it, simply > > create a path ranger and start using the API. There is even an > > available function which can be lightly called and doesn't require > > knowledge of the ranger: > > > > So what the wiki pages do not show is how contextual information > > is handled (and how that's done w/o dominators as the wiki page > > claims). That is, path_range_on_edge suggests that range information > > provided by the (all) controlling stmts of that edge are used when > > determining the range for 'name'. > > > > the wiki doesn't talk about it, but the document itself goes into great > > detail of how the dynamic process works. > > > > That's probably true for the 'stmt' variant as well. > > > > This leads me to guess that either the cache has to be > > O(number of SSA uses) size or a simple VRP pass implemented > > using Ranger querying ranges of all SSA ops for each stmt > > will require O(number of SSA uses) times analysis? > > > > The current cache is not particularly efficient, its size is #ssa_name * > > #BB, assuming you actually go and look at every basic block, which many > > passes do not. It also only puts a range in there when it has to. Im > > not sure why it really matters unless we find a pathological case that > > kills compile time which cannot be addressed. This would all fall under > > performance analysis and tweaking. Its the end performance that matters. > > > > When a range request is made for NAME in a block, it requests a range on > > each incoming edge, and then unions those ranges, and caches it. This is > > the range-on-entry cache. The next time a range request for NAME is made > > for on-entry to that block, it simply picks it up from the cache. > > > > The requests for range on an edge uses the gori_map summary to decide > > whether that block creates a range for NAME or not. > > If the block does not generate a range, it simply requests the > > range-on-entry to that block. > > If it does generate a range, then it examines the statements defining the > > condition, gets ranges for those, and calculates the range that would be > > generated on that edge. It also requests ranges for any of those names in > > the defining chain which originate outside this block. > > > > So it queries all the names which feed other names and so on. The queries > > end when a statement is found that doesn't generate a useful range. > > Sometimes it has to go hunt quite a way, but usually they are nearby. And > > caching the range-on-entry values prevents the lookups from doing the same > > work over and over. > > > > Once a range has been calculated, we don't spend much more time calculating > > it again. Any other ssa-name which uses that name also doesnt need to > > recalculate it. For the most part, we walk the def chains for any given > > ssa-name and its feeding names once and future requests pretty much come > > from the cache. > > > > So yes, there is also a lot of recursion in here at the moment. calls for > > ranges spawning other calls before they can be resolved. I suspect > > sometimes this call chain is deep. I don't know, we haven't performed any > > analysis on it yet. > > > > One of the advantages of a DOM-style or SSA-with-ASSERT_EXPR > > pass is that the number of evaluations and the size of the cache > > isn't that way "unbound". On the WIKI page you mention edge > > info isn't cached yet - whatever that means ;) > > > > So I wonder what's the speed for doing > > > > The on-entry caches prevent it from being "unbound". . and the intersection > > of incoming edge calculation performs the equivalent merging of ranges that > > DOM based processing give you. > > > > if (x > 10) > > A > > else > > B > > C > > > > block A has x [11, MAX] > > block B has x [MIN, 10] > > block C, at the bottom of a diamond, the 2 incoming edges union those 2 > > ranges back to [MIN, MAX] and we know nothing about the range of X again. > > Accomplished the same thing dominators would tell you. > > > > The edge info that "isnt cached yet", (and may not ever be), is simply the > > 'static' info calculated by evaluating the few statements at the bottom of > > the block feeding the condition. any values coming into the block are > > cached. I this case its simply looking at x > 10 and calculating the > > range on whichever edge. One of the original design elements called for > > this to be cached, but in practice I don't think it needs to be. But we > > haven't analyzed the performance yet.. so it can only get quicker :-) > > > > > > > > > > FOR_EACH_BB_FN (bb) > > FOR_EACH_STMT (stmt) > > FOR_EACH_SSA_USE (stmt) > > path_range_on_stmt (..., SSA-USE, stmt); > > > > assuming that it takes into account controlling predicates. > > That's actually what EVRP does (within a DOM walk) given > > it tries to simplify each stmt with based on range info of the ops. > > > > > > FOR_EACH_BB_FN (bb) > > FOR_EACH_STMT (stmt) > > path_range_on_stmt (..., stmt); > > > > would be sufficient. In order to calculate the range for the definition of > > the statement, the Ranger will query the range of each SSA operand. > > > > I had Aldy run his cumulative .ii times with a patch added to EVRP which > > adds a ranger and makes a call to path_range_stmt on each PHI and every > > stmt as evrp processes it. > > > > evrp original time: 2025055 usec > > evrp + ranger: 3787678 usec > > ------- > > ranger time: 1762623 > > > > So to be fair we aren't actually doing anything and evrp is, but the actual > > folding of statement is not a big time killer which is the only additional > > thing EVRP is doing. This indicates that for a full-on stmt-by-stmt walk > > and process, we're certainly in the same time ballpark. > > > > A significant point is that the stmt-by-stmt walk is also not the intended > > target. Thats old school now :-) Many clients only care about a few > > ranges. > > > > The "quick and dirty" RVRP pass only looks at the last condition in the > > block and catches the vast majority of what the stmt-by-stmt walk > > accomplishes. If timings for the RVRP pass are any indication, I would say > > it walks/calculates 60% of the statements in the program to calculate flow > > conditions. Processing a lot of arithmetic statements that can never > > result in a fold seems kinda pointless. leave those to constant prop, > > (which could use the ranger as client). The extra stmt-by-stmt > > opprtunities found mostly appear to find PHIs that can be folded and the > > odd condition that doesn't feed a branch. > > > > EVRP also *has* to walk each statement because that is the the model it > > uses to find ranges. build them from the top down. The ranger model does > > not have this need. If uses of the ranger were more prolific in other > > passes, I think a lot of things the existing VRP passes get on these > > statements would be caught in the normal processing the other passes do. > > We did put a stmt-by-stmt RVRP on the todo list mostly so we could get a > > better apple-to-apple time comparison and to make sure we would eventually > > have full coverage of its VRP's abilities. > > > > I would also note again that we have made no attempt to optimize things > > yet. We don't really know how much extra work we're doing that doesn't need > > to be redone. We just know the current speed is a pretty good starting > > point. We're seeing big improvements in the passes that are changed to use > > it. > > > > we will get to finding the inefficiencies and try to address them. > > > > > > > > More generally my remarks are still the same as with the irange introduction > > attempt. > > > > It's a no-no to add yet another set of range op primitives given we already > > have > > them implemented (with trees, yes...) in tre-vrp.c. The existing ones > > are already > > factored out to some extent so I expect you will continue that and split out > > the wide-int cases from those existing routines. Yes - that will not be > > very > > C++-ish - but who cares. Simply implement C-style workers, for example > > be inspired by the tree-ssa-ccp.c:bit_value_* routines implementing > > a kind of non-zero-bits operation primitives ontop of wide_ints (and also > > wrapping those with tree variants). > > > > Sharing of code to do the same work was covered as something we will get > > to. Aldy is working on that so there is not dual maintenance. I had started > > it with the wide-int-aux.[ch] file in the branch but never got very far. > > Yeah, its not super pretty, but it is what it is. > > > > It wont be perfect however as many of those routines are written with the > > assumption that ranges are either a single range or an anti range. we no > > longer have that restriction and we will diverge in those cases. > > But having 1 single range or N ranges isn't different when it comes to > the core working routines. Because if you have N ranges it's simply > apply the op N times and union the result. Yes - the special case is > now that OP (range) can result in multiple ranges but currently > all interesting cases you likely implement are simply covered by > returning either a range or an anti-range. Because more generally > you'd not want N ranges but some affine thing so you can handle even > more cases without blowing up storage. > > > Extricating the code from the symbolic handling is also not straightforward > > some times as there are big switches which mix and match various bits of > > each case. > > Yes. Which is why I never really got far with dumping the symbolic > stuff because it's hard to come up with a replacement. I guess > in reality you want to model symbolic ranges as constraints on affine > expressions and then have some constraint solver to > answer queries -- at least for symbolic ranges the interesting > transforms involve comparison simplifications (aka jump threading) > only. > > > We will however try to distill it down to the wide-int aspects that are > > common. > > > > I would also argue that long term I see value_range disappearing > > completely... and then there wont be any dual maintenance :-) > > > > > > > > For most of your on-demand uses I would expect a _much_ simpler and > > less heavy-weight approach to be extending the recently merged > > determine_value_range (I've pointed you to this multiple times in the > > past) to walk SSA def-use chains using the existing SSA on-the-side > > info as cache (yes, w/o use-granular contextual information). > > > > I think what we have is pretty lightweight and more powerful than that > > would be. I started from scratch so we could get all the ideas in one > > place working together and not suffer from limits imposed by existing > > assumptions or information not easily available. I think our results > > confirm those benefits. Its fast, easy to use, and its already starting to > > get things EVRP does not get. It is also showing signs of finding things > > VRP itself has trouble with. When we handle equivalencies I would expect > > we will exceed current capabilities. > > > > > > I also think that purely restricting yourself to wide-ints is a mistake - we > > do very well want to have range information for REAL_CSTs (or composite > > integers / reals). How do you think of extending Ranger to handle different > > kind of types? Eric already raised the issue of symbolic ranges. > > > > This is in fact one of the main reasons we wrote everything new, including > > a distinct range class. We expected to want to make this work on other > > types. > > > > Irange and range-ops is a class to handle integer ranges. We have a very > > well defined API for ranges now. The range engine itself simply uses this > > API. > > > > If we write class 'real_range' to manipulate & store real ranges and > > implement 'real_range_ops' to teach the compiler how real ranges are > > extracted from expressions, the entire ranger engine will work on reals. > > > > There are 2 options to proceed with this > > > > a) we'd either template the ranger to use <class Range> instead of class > > irange, or at least extract out the common code and template that. And > > then we'll have the same on-demand ranger for "real" ranges, or "complex" > > or whatever. so instead of invoking it with 'path_ranger r', it'd be > > something like 'path_ranger<irange> r', or path_ranger<real_range> r; > > > > I like that less because I hate templates, but it is an option. The more > > likely case I planned for is: > > > > b) adjust class irange to derive from a base range class with the API as > > virtual functions, and inherit irange and real_range from that for the > > ranger to call. Then it would handle both simultaneously. We'd have some > > tweaking to do where the ranger currently checks the type of things to be > > integral, but nothing too serious. The entire structure of the range > > engine will still apply, and we could process both simultaneously. > > > > The second model was also intended from the beginning to also allow us to > > adjust the number of sub-ranges dynamically. Right now we set it to 3 > > sub-ranges at compile time, but he intention is to allow this to vary if we > > want. There are cases where some optimization might be willing to spend > > extra time/memory in order to get really accurate range info. Maybe switch > > analysis could use an irange which allowed up to 1000 subranges so it > > could very precisely map what the value is at any given point. The general > > case doesn't need that extra memory usage however. > > > > It just gives us flexibility we don't currently have. We also expect to do > > some tests and find out what the "optimum" default number of ranges are. I > > chose 3 because it was handling cases I wanted to get we couldn't get > > before. Maybe a different number is better, we'd have to run some > > experiments to see how often we could use more ranges. > > > > > > > > Yes, I do like having wide-int workers for ranges. > > > > Yes, I do like the idea of making ranges not restricted to single sub-ranges > > (didn't review the implementation). > > > > But those things should have been done on the existing code, not sticking > > sth completely new into GCC that cannot subsume the existing stuff. That's > > the road to maintainance hell which we unfortunately walk way too often > > with GCC :/ > > > > I argue that this will subsume all the existing code related to it. Sure > > we cannot right this moment, its not a small task. I think our pathway out > > of maintenance hell involves removing the existing value_range code and the > > various incarnations of VRP we have and using something modern designed > > from scratch to solve these problems. I would also predict we could do in > > the release after this next one. So we'd have one release with the ranger > > and VRP brothers both present, and after that it would be just the Ranger. > > > > Btw, the original goal of EVRP was to get rid of the ASSERT_EXRP-style > > VRP pass once the threading bits it has are subsumed by the backwards > > threader. Does ranger at least allow that - get rid of the threading inside > > VRP without regressions? > > > > The ranger calculates ranges. It does not implement a complete VRP > > replacement. We will implement a full VRP in time. One of the original > > goals was to get this working so we can remove all the micro-optimizations > > that VRP does because range info isn't available anywhere else. The clean > > goal is all the threading related range work should be done in threading > > via calls to the ranger. > > > > So to more directly answer the question. I do not know the state today. > > I'm not the threader dude :-) Aldy enhanced the backward threader to do > > some things we couldn't do before, and converted those other 3 passes. I > > think Jeff has on his plate to remove the threading parts from VRP and put > > them back in the threader. So he can clarify this. By the end of the > > stage I would expect that threader work to be complete and out of VRP. > > > > > > Just as a final note - open source development doesn't work like this, > > citing you "I'd like to introduce a project we've been working on for > > the past year > > an a half." - this simply provokes "reviews" like the above. > > > > well, it is what it is. I see nothing wrong with the "review". quite > > frankly, it's pretty much as expected. I think it would have been almost > > identical if I'd talked about it last year. > > > > And why can't it can work like this? We aren't suggesting we shove this > > into trunk right now. We've been experimenting with the technology, and > > now we think its developed to the point where we think it can be of > > benefit, and we can demonstrate it works. Its on a branch and if anyone > > else wants to help. awesome. We have a to do list. It'll probably be > > months before we're even ready to be considered for a merge. It has now > > advanced to the point where we have tangible things to talk about for the > > first time. > > > > Sure, we could replace irange with value_range and range-ops with some > > serious reconfiguration of the range extraction code, and the ranger would > > still work. It just needs the same range API. I happen to think irange and > > range-ops offers benefits that value_range can't, and so to do that would > > be a step backwards. > > > > I chose to create irange so we had the flexibility to enhance its ability > > in the future while offering some improvements now. > > Range-ops was a desire to unite the range processing code in one place, and > > I needed to add the ability to do algebraic re-arrangements that we > > currently dont do in VRP... the op1_irange and op2_irange code that derives > > ranges for operands given a LHS and the other operand. Yeah, its not in > > one place yet, but it will be. > > > > So at what point do we decide community involvement is appropriate. As soon > > as I get the idea? Maybe after I had the third prototype sort-of working > > indicating the approach was feasible? It was pretty ugly then. I wouldn't > > want anyone to have seen that :-) All the exact same questions would > > come up then as now. We need to handle symbolics. too expensive. reuse > > the existing range code. Rather than waving my hands to answer and not > > really accomplish anything, I can now point to tangible things. (mostly :-) > > I would have still pushed for irange. and range-ops because I think they > > are the right strategic solution. Maybe someone else could have helped > > advance things a little faster, but I was a pretty big bottle neck on > > design and reworking things. ask Aldy. > > > > The only thing that we ever had working that was even worth considering > > submitting was the irange work last July. and that did not get very far. > > We could also have waited another year until we have a complete VRP > > replacement working. Then its a drop in solution that removes a lot of the > > arguments about code sharing and not re-using value_range. That might have > > been even easier! I do think its reached the point where there is a lot of > > benefit to be derived now, and so here it is, lets figure out what we can > > do with it. > > > > Anyway, the "lateness" of he introduction is purely on me. I like to get > > things working before talking about them especially when they start with > > half baked ideas. > > I would have liked to see factoring out from current code first and > actually enhancing the current value-range represenation by removing > the > restriction on single ranges. > > I don't very much believe in the "start from scratch" approach. > Because history shows we always end up with both, the old and the new > afterwards. > > Richard. > > > I think this is a good strategic direction to go, demonstrates benefit now, > > and has a path to a better/easier to maintain VRP with a compiler-wide > > range resource. > > > > Andrew > > > > > > > > > > > > > >