On Fri, Dec 3, 2021 at 9:42 PM Andrew MacLeod via Gcc-patches <gcc-patches@gcc.gnu.org> wrote: > > When a request is made for the range of an ssa_name at some location, > the first thing we do is invoke range_of_stmt() to ensure we have looked > at the definition and have an evaluation for the name at a global > level. I recently added a patch which dramatically reduces the call > stack requirements for that call. > > Once we have confirmed the definition range has been set, a call is made > for the range-on-entry to the block of the use. This is performed by > the cache, which proceeds to walk the CFG predecessors looking for when > ranges are created (exported), existing range-on-entry cache hits, or > the definition block. Once this list of predecessors has been created, > a forward walk is done, pushing the range's through successor edges of > all the blocks were visited in the initial walk. > > If the use is far from the definition, we end up filling a lot of the > same value on these paths. Also uses which are far from a > range-modifying statement push the same value into a lot of blocks. > > This patch tries to address at least some inefficiencies. It recognizes > that > > First, if there is no range modifying stmt between this use and the last > range we saw in a dominating block, we can just use the value from the > dominating block and not fill in all the cache entries between here and > there. This is the biggest win. > > Second. if there is a range modifying statement at the end of some > block, we will have to do the appropriate cache walk to this point, but > its possible the range-on-entry to THAT block might be able to use a > dominating range, and we can prevent the walk from going any further > than this block > > Combined, this should prevent a lot of unnecessary ranges from being > plugging into the cache. > > ie, to visualize: > > bb4: > a = foo() > <..> > bb60: > if (a < 30) > <...> > bb110: > g = a + 10 > > if the first place we ask for a is in bb110, we walk the CFG from 110 > all the way back to bb4, on all paths leading back. then fill all those > cache entries. > With this patch, > a) if bb60 does not dominate bb110, the request will scan the > dominators, arrive at the definition block, have seen no range modifier, > and simply set the on-entry for 110 to the range of a. done. > b) if bb60 does dominate 110, we have no idea which edge out of 60 > dominates it, so we will revert to he existing cache algorithm. Before > doing so, it checks and determines that there are no modifiers between > bb60 and the def in bb4, and so sets the on-entry cache for bb60 to be > the range of a. Now when we do the cache fill walk, it only has to go > back as far as bb60 instead of all the way to bb4. > > Otherwise we just revert to what we do now (or if dominators are not > available). I have yet to see a case where we miss something we use to > get, but that does not mean there isn't one :-). > > The cumulative performance impact of this compiling a set of 390 GCC > source files at -O2 (measured via callgrind) is pretty decent: Negative > numbers are a compile time decrease. Thus -10% is 10% faster > compilation time. > > EVRP : %change from trunk is -26.31% (!) > VRP2 : %change from trunk is -9.53% > thread_jumps_full : %change from trunk is -15.8% > Total compilation time : %change from trunk is -1.06% > > So its not insignificant. > > Risk would be very low, unless dominators are screwed up mid-pass.. but > then the relation code would be screwed up too. > > Bootstrapped on x86_64-pc-linux-gnu with no regressions. OK?
OK. Wow - I didn't realize we have this backwards CFG walk w/o dominators ... I wonder if you can add a counter to visualize places we end up using this path. I suggest to add statistics_counter_event (cfun, "slow range query", 1); or so and see with -fdump-statistics-stats which passes eventually trigger that (I suspect none?). Thanks, Richard. > Andrew > > > >