On Mon, May 2, 2022 at 8:30 AM Richard Biener
<richard.guent...@gmail.com> wrote:
>
> On Fri, Apr 29, 2022 at 6:22 PM Aldy Hernandez <al...@redhat.com> wrote:
> >
> > On Fri, Apr 29, 2022 at 12:13 PM Richard Biener
> > <richard.guent...@gmail.com> wrote:
> > >
> > > On Fri, Apr 29, 2022 at 11:53 AM Aldy Hernandez <al...@redhat.com> wrote:
> > > >
> > > > On Fri, Apr 29, 2022 at 9:09 AM Richard Biener
> > > > <richard.guent...@gmail.com> wrote:
> > > > >
> > > > > On Thu, Apr 28, 2022 at 8:44 PM Jeff Law via Gcc-patches
> > > > > <gcc-patches@gcc.gnu.org> wrote:
> > > > > >
> > > > > >
> > > > > >
> > > > > > On 4/28/2022 10:30 AM, Aldy Hernandez wrote:
> > > > > > > [Jeff, this is the same patch I sent you last week with minor 
> > > > > > > tweaks
> > > > > > > to the commit message.]
> > > > > > And I just dropped it into the tester.  We should have the vast 
> > > > > > majority
> > > > > > of targets tested by this time tomorrow.
> > > > > >
> > > > > > >
> > > > > > > [Despite the verbosity of the message, this is actually a pretty
> > > > > > > straightforward patch.  It should've gone in last cycle, but there
> > > > > > > was a nagging regression I couldn't get to until after stage1
> > > > > > > had closed.]
> > > > > > You had other, non-work/gcc priorities.  No worries at missing 
> > > > > > gcc-12.
> > > > > >
> > > > > >
> > > > > > >
> > > > > > > There are 3 uses of EVRP in DOM that must be converted.
> > > > > > > Unfortunately, they need to be converted in one go, so further
> > > > > > > splitting of this patch would be problematic.
> > > > > > ACK
> > > > > >
> > > > > >
> > > > > > > There's nothing here earth shattering.  It's all pretty obvious in
> > > > > > > retrospect, but I've added a short description of each use to aid 
> > > > > > > in
> > > > > > > reviewing:
> > > > > > >
> > > > > > > * Convert evrp use in cprop to ranger.
> > > > > > >
> > > > > > >    This is easy, as cprop in DOM was converted to the ranger API 
> > > > > > > last
> > > > > > >    cycle, so this is just a matter of using a ranger instead of an
> > > > > > >    evrp_range_analyzer.
> > > > > > Yea.  We may have covered this before, but what does ranger do with 
> > > > > > a
> > > > > > conditional equivalence?
> > > > > >
> > > > > > ie if we have something like
> > > > > >
> > > > > > if (a == b)
> > > > > >    {
> > > > > >      blah blah
> > > > > >    }
> > > > > >
> > > > > > Within the true arm we know there's an equivalence.  *But* we don't 
> > > > > > want
> > > > > > to just blindly replace a with b or vice-versa.  The equivalence is
> > > > > > primarily useful for simplification rather than propagation.
> > > > > >
> > > > > > In fact, we every much do not want to cprop in these cases.   It 
> > > > > > rarely
> > > > > > helps in a meaningful way and there's no good way to know which is 
> > > > > > the
> > > > > > better value to use.  Canonicalization on SSA_NAMEs doesn't  really 
> > > > > > help
> > > > > > due to SSA_NAME recycling.
> > > > > >
> > > > > >
> > > > > > >
> > > > > > > * Convert evrp use in threader to ranger.
> > > > > > >
> > > > > > >    The idea here is to use the hybrid approach we used for the 
> > > > > > > initial
> > > > > > >    VRP threader conversion last cycle.  The DOM threader will 
> > > > > > > continue
> > > > > > >    using the forward threader infrastructure while continuing to 
> > > > > > > query
> > > > > > >    DOM data structures, and only if the conditional does not 
> > > > > > > relsolve,
> > > > > > >    using the ranger.  This gives us the best of both worlds, and 
> > > > > > > is a
> > > > > > >    proven approach.
> > > > > > >
> > > > > > >    Furthermore, as frange and prange come live in the next cycle, 
> > > > > > > we
> > > > > > >    can move away from the forward threader altogether, and just 
> > > > > > > add
> > > > > > >    another backward threader.  This will not only remove the last 
> > > > > > > use
> > > > > > >    of the forward threader, but will allow us to remove at least 
> > > > > > > 1 or 2
> > > > > > >    threader instances.
> > > > > > Excellent.
> > > > > >
> > > > > > >
> > > > > > > * Convert conditional folding to use the method used by the 
> > > > > > > ranger and
> > > > > > >    evrp.  Previously DOM was calling into the guts of
> > > > > > >    simplify_using_ranges::vrp_visit_cond_stmt.  The blessed way 
> > > > > > > now is
> > > > > > >    using fold_cond() which rewrites the conditional and edges
> > > > > > >    automatically.
> > > > > > >
> > > > > > >    When legacy is removed, simplify_using_ranges will be further
> > > > > > >    cleaned up, and there will only be one entry point into 
> > > > > > > simplifying
> > > > > > >    a statement.
> > > > > > Sure.
> > > > > >
> > > > > > >
> > > > > > > * DOM was setting global ranges determined from unreachable edges 
> > > > > > > as a
> > > > > > >    side-effect of using the evrp engine.  We must handle these 
> > > > > > > cases
> > > > > > >    before nuking evrp, and DOM seems like a good fit.  I've just 
> > > > > > > moved
> > > > > > >    the snippet to DOM, but it could live anywhere else we do a DOM
> > > > > > >    walk.
> > > > > >   It was a semi-desirable to record/refine global ranges in DOM, 
> > > > > > but I'd
> > > > > > be surprised if too much code depended on that.  Presumably there's 
> > > > > > a
> > > > > > testcase in the suite which we don't handle well if we don't let DOM
> > > > > > refine/record global ranges?
> > > > > >
> > > > > > >
> > > > > > >    For the record, this is the case *vrp handled:
> > > > > > >
> > > > > > >       <bb C>:
> > > > > > >       ...
> > > > > > >       if (c_5(D) != 5)
> > > > > > >       goto <bb N>;
> > > > > > >       else
> > > > > > >       goto <bb M>;
> > > > > > >       <bb N>:
> > > > > > >       __builtin_unreachable ();
> > > > > > >       <bb M>:
> > > > > > >
> > > > > > >    If M dominates all uses of c_5, we can set the global range of 
> > > > > > > c_5
> > > > > > >    to [5,5].
> > > > > > And that will only happen if M eventually loops back to the 
> > > > > > conditional,
> > > > > > right?  Otherwise all the uses wouldn't be dominated. I suspect 
> > > > > > this is
> > > > > > exceedingly rare in practice an it looks really familiar.  This is
> > > > > > coming from a testcase, right?
> > > > >
> > > > > This is how we make use of assert() when it uses 
> > > > > __builtin_unreachable.
> > > > >
> > > > > Note the difficulty here is that we have to do this at a very 
> > > > > specific point
> > > > > in the pass pipeline (setting the global range), since the first time 
> > > > > we'll
> > > > > fold the if (c_5(D) != 5) it will be folded to false and the IL 
> > > > > representation
> > > > > of the assert() is gone.
> > > > >
> > > > > So the idea was that with the IL present value-range analysis can 
> > > > > determine
> > > > > the range omf c_5(D) on the false path but once we remove it (and we 
> > > > > do
> > > > > want to remove it) we want to put a global range on c_5(D).
> > > > >
> > > > > That means the best of both worlds would be to detect this pattern at
> > > > > a specific point in the pass pipeline (somewhen before loop opts for 
> > > > > sure)
> > > > > and do both - remove the IL _and_ set the global range.  The fear of
> > > > > doing this too early is of course that we lose the range by copy 
> > > > > propagation
> > > > > or similar transforms.  The fear of doing this change at a single 
> > > > > place only
> > > > > is that we miss it because there's a stray use before the conditional.
> > > > >
> > > > > Note DOM is also run in pass_ipa_oacc_kernels which may be too early.
> > > > >
> > > > > In the end we want to perform dead code elimination here so maybe
> > > > > DCE is the best fit to do this (but as said we want to avoid doing 
> > > > > this too
> > > > > early).
> > > >
> > > > I agree that doing both would be the ideal scenario.  However, I would
> > > > prefer this be done in a follow-up patch to minimize the amount of
> > > > stuff being changed in one go.
> > > >
> > > > To provide additional background, this was being done in evrp which
> > > > runs at -O2, but was also being done at -O1 as a side-effect of DOM
> > > > using the evrp engine and exporting global ranges.  For -O2, I don't
> > > > know how much DOM was originally (prior to ranger) contributing to
> > > > these assert global ranges,  since evrp was probably picking them up
> > > > first.  However, since evrp now runs in ranger mode, this
> > > > functionality has silently moved to DOM (without us realizing) for all
> > > > compilation levels.
> > >
> > > I see.
> > >
> > > > Do we care about this functionality at -O1?  ISTR at least one -O1
> > > > testcase failing in RTL land if exporting global ranges isn't done in
> > > > DOM.  I don't remember if it was a global range due to a
> > > > __builtin_unreachable or otherwise.
> > >
> > > Well, at -O1 we definitely didn't set the global range (did we?) but
> > > eventually pruned paths running into __builtin_unreachable ()
> > > anyway (would have to check at which point).
> >
> > I think we always have:
> >
> > evrp_range_analyzer analyzer (/*update_global_ranges=*/true);
> >
> > ??
> >
> > DOM has been running as an incognito evrp even at -O1.  I don't know
> > if this has been by design, or as an unintended side-effect.  I mean,
> > it has been visiting all conditionals in the IL as well as calling
> > evrp_range_analyzer::record_ranges_from_stmt() on each statement it
> > tries to optimize.  And of course, updating global ranges along the
> > way.
>
> As said, it wasn't intended to be that way originally.  But yes, as we
> separated out evrp_range_analyzer and made use of it in places not
> guarded by -ftree-vrp this might have happened.
>
> We might want to reconsider (based on now long-standing facts),
> and maybe alter the -ftree-vrp documentation indicating that even
> when not enabled effecive value-range propagation and optimization
> might happen.
>
> > >
> > > >
> > > > One more thing, we have batted around the idea of running a fast evrp
> > > > even at -O1 (ranger without looking at back edges, IIRC).  This would
> > > > mean that when the DOM threader becomes disentangled from DOM (when we
> > > > have frange and prange), there's very little DOM will be doing that
> > > > for instance PRE can't get.  So perhaps, if we remove DOM, the place
> > > > to put this optimization is in the fast evrp, and fold the conditional
> > > > as has been suggested?  Anyways... I'm digressing.
> > >
> > > The idea is to get rid of DOM once its threading is separated out.
> >
> > Excellent.  There's no reason why we couldn't separate threading out
> > in this release.
> >
> > >
> > > I think doing O (n * log n) value range analysis at -O1 would be welcome.
> > > We didn't get around doing that for EVRP which I think had such
> > > bound - the iterating VRP definitely didn't.  I'm not sure ranger really
> > > qualifies given gori and all it relies on.  The advantage of the simple
> > > whole-IL walk way was to reasonably easily guarantee such bound.
> > > That's unfortunately lost now :/
> >
> > Andrew was mumbling something about a fast ranger mode for this
> > release that should be on par with legacy evrp.  IIRC it would be
> > purely DOM based, won't visit back edges, and there's no caching.  But
> > he'll have to expand on it when he returns from vacation.  I don't
> > know the details.
>
> I would guess the stmt analysis building blocks (whatever API part of
> ranger that is) can be used to produce something like that.  But in the
> end it would be the old EVRP pass with the VRP stmt analysis it
> re-used replaced with the appropriate ranger parts.
>
> But yes, I'd welcome that.  I'd also like to revisit integration of some
> of this with value-numbering which doesn't do a DOM walk but instead
> a RPO walk.
>
> > >
> > > > Doing this optimization in DOM at least keeps with GCC 12, but I'm
> > > > happy to move it to whatever is recommended.
> > >
> > > As said the intent of this trick is to preserve range information from
> > > asserts even when we removed the IL carrying it.  When we care
> > > about range info, which means when some VRP pass runs, which
> > > is why it was done in such a pass.  That is, we probably want to
> > > avoid altering the global range for -fno-tree-vrp.
> >
> > Not just -fno-tree-vrp then.  There are other range consumers.  For
> > example, CCP, threading, etc.  Threading will look at global ranges
> > even in dumb mode, and CCP uses global ranges in a round about way--
> > by looking at global nonzero bits which the evrp code diligently sets:
> >
> >         {
> >           set_ssa_range_info (vrs[i].first, vrs[i].second);
> >           maybe_set_nonzero_bits (pred_e, vrs[i].first);
> >         }
> >
> > For example, in this particular testcase, DOM will set the global
> > range, and CCP will remove at least one of the conditionals by looking
> > at nonzero bits.
>
> Yes, as said we might want to reconsider and adjust documentation.
> In theory we could gate the global range/nonzero setters on
> flag_tree_vrp so they become no-ops when not enabled
> (there is also flag_tree_bit_ccp which is documented to track
> alignment, it fails to mention non-zero bits on integers).

Well, we could just not update global ranges for !flag_tree_vrp.  In
the old world:

evrp_range_analyzer (/*update_global_ranges=*/ flag_tree_vrp)

and in the new world:

if (flag_tree_vrp)
  ranger->export_global_ranges().

This should fit in nicely with my plan to integrate nonzero bits into
the irange, because we won't have to worry about them independently

>
> But let's say the plan is to more conciously enable some range
> propagation at -O1.

Sounds like a plan.

Aldy

Reply via email to