On Wed, Apr 30, 2025 at 12:00 AM Andrew MacLeod <amacl...@redhat.com> wrote: > > > On 4/28/25 17:26, Andrew MacLeod wrote: > > I have committed this patch to trunk after bootstrap/regression > > testing again on trunk. > > > > I'll get to gcc14/15 once I flush the current queue. > > > > Andrew > > > > > > On 4/17/25 06:44, Richard Biener wrote: > >> On Wed, Apr 16, 2025 at 10:55 PM Andrew MacLeod <amacl...@redhat.com> > >> wrote: > >>> This was a fun one! An actual bug, and it took a while to sort out. > >>> After chasing down some red herrings, this turns out to be an issue of > >>> interaction between the range and value masks and intervening > >>> calculations. > >>> > >>> The original patch from 11/2023 adjusts intersection so that it can > >>> enhance subranges based on the value mask. ie in this testcase > >>> > >>> [irange] int [-INF, 2147483644] MASK 0xfffffffc VALUE 0x1 > >>> > >>> If adjust_range() were called on this, it would eliminate the > >>> trailing > >>> mask/value bit ranges that are invalid and turn it into : > >>> > >>> [-INF, -3][1, 1][4, 2147483626] MASK 0xfffffffc VALUE 0x1 > >>> > >>> reflecting the lower bits into the range. The problem develops > >>> because > >>> we only apply adjust_range () during intersection in an attempt to > >>> avoid expensive work when it isnt needed. > >>> > >>> Unfortunately, that is what triggers this infinite loop. Rangers cache > >>> propagates changes, and the algorithm is designed to always improve the > >>> range. In this case, the first iteration through, _11 receives the > >>> above value, [irange] int [-INF, 2147483644] MASK 0xfffffffc VALUE 0x1 > >>> which via the mask, excludes 0, 2 and 3. > >>> > >>> The ensuing calculations in block 7 do not trigger a successful > >>> intersection operation, and thus the range pairs are never expanded to > >>> eliminate the lower ranges, and it triggers another change in values > >>> which leads to the next iteration being less precise, but not obviously > >>> so. [irange] int [-INF, 2147483644] MASK 0xfffffffd VALUE 0x0 is a > >>> result of the calculation. As ranges as suppose to always get better > >>> with this algorithm, we simply compare for difference.. and this range > >>> is different, and thus we replace it. It only excludes 2 and 3. > >>> > >>> Next iteration through the less precise range DOES trigger an > >>> intersection operation in block 7, and when that is expanded to > >>> [irange] > >>> int [-INF, 1][4, 2147483644] MASK 0xfffffffd VALUE 0x0 using that we > >>> can > >>> again create the more precise range for _11 that started the cycle. and > >>> we go on and on and on. > >>> > >>> If we fix this so that we always expand subranges to reflect the lower > >>> bits in a bitmask, the initial value starts with > >>> > >>> [irange] int [-INF, -3][1, 1][4, 2147483644] MASK 0xfffffffc VALUE 0x1 > >>> > >>> And everything falls into place as it should. The fix is to be > >>> consistent about expanding those lower subranges. > >>> > >>> I also added a couple of minor performance tweaks to avoid unnecessary > >>> work, along with removing adjust_range () directly into > >>> set_range_from_bitmask () . > >>> > >>> I started at a 0.2% overall compilation increase (1.8% in VRP). In the > >>> end, this patch is down to 0.6% in VRP, and only 0.08% overall, so > >>> manageable for all the extra work. > >>> > >>> It also causes a few ripples in the testsuite so 3 test cases also > >>> needed adjustment: > >>> > >>> * gcc.dg/pr83072-2.c : With the addition of the expanded ranges, > >>> CCP > >>> use to export a global: > >>> Global Exported: c_3 = [irange] int [-INF, +INF] MASK 0xfffffffe > >>> VALUE 0x1 > >>> and now > >>> Global Exported: c_3 = [irange] int [-INF, -1][1, +INF] MASK > >>> 0xfffffffe VALUE 0x1 > >>> Which in turn enables forwprop to collapse part of the testcase much > >>> earlier. So I turned off forwprop for the testcase > >>> > >>> * gcc.dg/tree-ssa/phi-opt-value-5.c : WIth the expanded ranges, CCP2 > >>> pass use to export: > >>> Global Exported: d_3 = [irange] int [-INF, +INF] MASK 0xfffffffe > >>> VALUE 0x1 > >>> and now > >>> Global Exported: d_3 = [irange] int [-INF, -1][1, +INF] MASK > >>> 0xfffffffe VALUE 0x1 > >>> which in turn makes the following comment obsolete as the optimization > >>> does happen earlier.: > >>> /* fdiv1 requires until later than phiopt2 to be able to detect that > >>> d is non-zero. to be able to remove the conditional. */ > >>> Adjusted the testcase to expect everything to be taken care of by > >>> phi-opt2 pass. > >>> > >>> * gcc.dg/tree-ssa/vrp122.c : Previously, CCP exported: > >>> Global Exported: g_4 = [irange] unsigned int [0, +INF] MASK > >>> 0xfffffff0 VALUE 0x0 > >>> and then EVRP refined that and stored it, then the testcase tested for: > >>> Global Exported: g_4 = [irange] unsigned int [0, 0][16, +INF] MASK > >>> 0xfffffff0 VALUE 0x0 > >>> Now, CCP itself exported the expanded range, so there is nothing for > >>> VRP > >>> to do. > >>> adjusted the testcase to look for the expanded range in CCP. > >>> > >>> Now we never get into this situation where the bitmask is explicitly > >>> applied in some places and not others. > >>> > >>> Bootstraps on x86_64-pc-linux-gnu with no regressions. Finally. Is > >>> this OK for trunk, or should I hold off a little bit? > >> Please wait a little bit until after 15.1 is out. It's then OK for > >> trunk in > >> stage1 and backports when no issues show up. > >> > >> Thanks, > >> Richard. > >> > >>> Andrew > This is now in trunk. Attached are the patches for gcc15 and gcc14. > > Bootstrapped with no regressions on x86_64-pc-linux-gnu. > > Do you want me to check it in for either or both branches?
OK for 15. Richard. > > Andrew