Re: [COMMITTED] PR tree-optimization/119712 - Always reflect lower bits from mask in subranges.

Richard Biener Tue, 29 Apr 2025 23:57:34 -0700

On Wed, Apr 30, 2025 at 12:00 AM Andrew MacLeod <amacl...@redhat.com> wrote:
>
>
> On 4/28/25 17:26, Andrew MacLeod wrote:
> > I have committed this patch to trunk after bootstrap/regression
> > testing again on trunk.
> >
> > I'll get to gcc14/15 once I flush the current queue.
> >
> > Andrew
> >
> >
> > On 4/17/25 06:44, Richard Biener wrote:
> >> On Wed, Apr 16, 2025 at 10:55 PM Andrew MacLeod <amacl...@redhat.com>
> >> wrote:
> >>> This was a fun one!   An actual bug, and it took a while to sort out.
> >>> After chasing down some red herrings, this turns out to be an issue of
> >>> interaction between the range and value masks and intervening
> >>> calculations.
> >>>
> >>> The original patch from 11/2023 adjusts intersection so that it can
> >>> enhance subranges based on the value mask.  ie in this testcase
> >>>
> >>> [irange] int [-INF, 2147483644] MASK 0xfffffffc VALUE 0x1
> >>>
> >>>    If adjust_range() were called on this, it would eliminate the
> >>> trailing
> >>> mask/value bit ranges that are invalid and turn it into :
> >>>
> >>> [-INF, -3][1, 1][4, 2147483626] MASK 0xfffffffc VALUE 0x1
> >>>
> >>> reflecting the lower bits into the range.   The problem develops
> >>> because
> >>> we only apply adjust_range ()  during intersection in an attempt to
> >>> avoid expensive work when it isnt needed.
> >>>
> >>> Unfortunately, that is what triggers this infinite loop. Rangers cache
> >>> propagates changes, and the algorithm is designed to always improve the
> >>> range.  In this case, the first iteration through, _11 receives the
> >>> above value, [irange] int [-INF, 2147483644] MASK 0xfffffffc VALUE 0x1
> >>> which via the mask, excludes 0, 2 and 3.
> >>>
> >>> The ensuing calculations in block 7 do not trigger a successful
> >>> intersection operation, and thus the range pairs are never expanded to
> >>> eliminate the lower ranges, and it triggers another change in values
> >>> which leads to the next iteration being less precise, but not obviously
> >>> so. [irange] int [-INF, 2147483644] MASK 0xfffffffd VALUE 0x0 is a
> >>> result of the calculation.   As ranges as suppose to always get better
> >>> with this algorithm, we simply compare for difference.. and this range
> >>> is different, and thus we replace it. It only excludes 2 and 3.
> >>>
> >>> Next iteration through the less precise range DOES trigger an
> >>> intersection operation in block 7, and when that is expanded to
> >>> [irange]
> >>> int [-INF, 1][4, 2147483644] MASK 0xfffffffd VALUE 0x0 using that we
> >>> can
> >>> again create the more precise range for _11 that started the cycle. and
> >>> we go on and on and on.
> >>>
> >>> If we fix this so that we always expand subranges to reflect the lower
> >>> bits in a bitmask, the initial value starts with
> >>>
> >>> [irange] int [-INF, -3][1, 1][4, 2147483644] MASK 0xfffffffc VALUE 0x1
> >>>
> >>> And everything falls into place as it should.  The fix is to be
> >>> consistent about expanding those lower subranges.
> >>>
> >>> I also added a couple of minor performance tweaks to avoid unnecessary
> >>> work, along with removing adjust_range () directly into
> >>> set_range_from_bitmask () .
> >>>
> >>> I started at a 0.2% overall compilation increase (1.8% in VRP). In the
> >>> end, this patch is down to 0.6% in VRP, and only 0.08% overall, so
> >>> manageable for all the extra work.
> >>>
> >>> It also causes a few ripples in the testsuite so 3 test cases also
> >>> needed adjustment:
> >>>
> >>>    * gcc.dg/pr83072-2.c :  With the addition of the expanded ranges,
> >>> CCP
> >>> use to export a global:
> >>>       Global Exported: c_3 = [irange] int [-INF, +INF] MASK 0xfffffffe
> >>> VALUE 0x1
> >>> and now
> >>>      Global Exported: c_3 = [irange] int [-INF, -1][1, +INF] MASK
> >>> 0xfffffffe VALUE 0x1
> >>> Which in turn enables forwprop to collapse part of the testcase much
> >>> earlier.     So I turned off forwprop for the testcase
> >>>
> >>> * gcc.dg/tree-ssa/phi-opt-value-5.c  : WIth the expanded ranges, CCP2
> >>> pass use to export:
> >>>      Global Exported: d_3 = [irange] int [-INF, +INF] MASK 0xfffffffe
> >>> VALUE 0x1
> >>> and now
> >>>      Global Exported: d_3 = [irange] int [-INF, -1][1, +INF] MASK
> >>> 0xfffffffe VALUE 0x1
> >>> which in turn makes the following comment obsolete as the optimization
> >>> does happen earlier.:
> >>> /* fdiv1 requires until later than phiopt2 to be able to detect that
> >>>      d is non-zero. to be able to remove the conditional.  */
> >>> Adjusted the testcase to expect everything to be taken care of by
> >>> phi-opt2 pass.
> >>>
> >>>    * gcc.dg/tree-ssa/vrp122.c : Previously, CCP exported:
> >>>      Global Exported: g_4 = [irange] unsigned int [0, +INF] MASK
> >>> 0xfffffff0 VALUE 0x0
> >>> and then EVRP refined that and stored it, then the testcase tested for:
> >>>      Global Exported: g_4 = [irange] unsigned int [0, 0][16, +INF] MASK
> >>> 0xfffffff0 VALUE 0x0
> >>> Now, CCP itself exported the expanded range, so there is nothing for
> >>> VRP
> >>> to do.
> >>> adjusted the testcase to look for the expanded range in CCP.
> >>>
> >>> Now we never get into this situation where the bitmask is explicitly
> >>> applied in some places and not others.
> >>>
> >>> Bootstraps on x86_64-pc-linux-gnu with no regressions. Finally.   Is
> >>> this OK for trunk, or should I hold off a little bit?
> >> Please wait a little bit until after 15.1 is out.  It's then OK for
> >> trunk in
> >> stage1 and backports when no issues show up.
> >>
> >> Thanks,
> >> Richard.
> >>
> >>> Andrew
> This is now in trunk.   Attached are the patches for gcc15 and gcc14.
>
> Bootstrapped with no regressions on x86_64-pc-linux-gnu.
>
> Do you want me to check it in for either or both branches?


OK for 15.

Richard.

>
> Andrew

Re: [COMMITTED] PR tree-optimization/119712 - Always reflect lower bits from mask in subranges.

Reply via email to