On Fri, Dec 3, 2021 at 9:42 PM Andrew MacLeod via Gcc-patches
<gcc-patches@gcc.gnu.org> wrote:
>
> When a request is made for the range of an ssa_name at some location,
> the first thing we do is invoke range_of_stmt() to ensure we have looked
> at the definition and have an evaluation for the name at a global
> level.  I recently added a patch which dramatically reduces the call
> stack requirements for that call.
>
> Once we have confirmed the definition range has been set, a call is made
> for the range-on-entry to the block of the use.  This is performed by
> the cache, which proceeds to walk the CFG predecessors looking for when
> ranges are created  (exported), existing range-on-entry cache hits,  or
> the definition block. Once this list of  predecessors has been created,
> a forward walk is done, pushing the range's through successor edges of
> all the blocks  were visited in the initial walk.
>
> If the use is far from the definition, we end up filling a lot of the
> same value on these paths.  Also uses which are far from a
> range-modifying statement push the same value into a lot of blocks.
>
> This patch tries to address at least some inefficiencies.  It recognizes
> that
>
> First, if there is no range modifying stmt between this use and the last
> range we saw in a dominating block, we can just use the value from the
> dominating block and not fill in all the cache entries between here and
> there.  This is the biggest win.
>
> Second. if there is a range modifying statement at the end of some
> block, we will have to do the appropriate cache walk to this point, but
> its possible the range-on-entry to THAT block might be able to use a
> dominating range, and we can prevent the walk from going any further
> than this block
>
> Combined, this should prevent a lot of unnecessary ranges from being
> plugging into the cache.
>
> ie, to visualize:
>
> bb4:
>    a = foo()
> <..>
> bb60:
>     if (a < 30)
> <...>
> bb110:
>      g = a + 10
>
> if the first place we ask for a is in bb110, we walk the CFG from 110
> all the way back to bb4, on all paths leading back. then fill all those
> cache entries.
> With this patch,
>    a) if bb60 does not dominate bb110, the request will scan the
> dominators, arrive at the definition block, have seen no range modifier,
> and simply set the on-entry for 110 to the range of a. done.
>    b) if bb60 does dominate 110, we have no idea which edge out of 60
> dominates it, so we will revert to he existing cache algorithm.  Before
> doing so, it checks and determines that there are no modifiers between
> bb60 and the def in bb4, and so sets the on-entry cache for bb60 to be
> the range of a.   Now when we do the cache fill walk, it only has to go
> back as far as bb60 instead of all the way to bb4.
>
> Otherwise we just revert to what we do now (or if dominators are not
> available).   I have yet to see a case where we miss something we use to
> get, but that does not mean there isn't one :-).
>
> The cumulative performance impact of this compiling a set of 390 GCC
> source files at -O2 (measured via callgrind) is pretty decent:  Negative
> numbers are a compile time decrease.  Thus -10% is 10% faster
> compilation time.
>
> EVRP     : %change from trunk is -26.31% (!)
> VRP2     : %change from trunk is -9.53%
> thread_jumps_full   : %change from trunk is -15.8%
> Total compilation time  : %change from trunk is -1.06%
>
> So its not insignificant.
>
> Risk would be very low, unless dominators are screwed up mid-pass.. but
> then the relation code would be screwed up too.
>
> Bootstrapped on  x86_64-pc-linux-gnu with no regressions. OK?

OK.

Wow - I didn't realize we have this backwards CFG walk w/o dominators ...
I wonder if you can add a counter to visualize places we end up using this path.
I suggest to add statistics_counter_event (cfun, "slow range query", 1); or so
and see with -fdump-statistics-stats which passes eventually trigger that
(I suspect none?).

Thanks,
Richard.

> Andrew
>
>
>
>

Reply via email to