Jakub Jelinek <ja...@redhat.com> writes:

> Hi!
>
> The following testcase is miscompiled during evrp.
> Before vrp, we have (from ccp):
>   # RANGE [irange] long long unsigned int [0, +INF] MASK 0xffffffffffffc000 
> VALUE 0x2d
>   _3 = _2 + 18446744073708503085;
> ...
>   # RANGE [irange] long long unsigned int [0, +INF] MASK 0xffffffffffffc000 
> VALUE 0x59
>   _6 = (long long unsigned int) _5;
>   # RANGE [irange] int [-INF, +INF] MASK 0xffffc000 VALUE 0x34
>   _7 = k_11 + -1048524;
>   switch (_7) <default: <L5> [33.33%], case 8: <L7> [33.33%], case 24: <L6> 
> [33.33%], case 32: <L6> [33.33%]>
> ...
>   # RANGE [irange] long long unsigned int [0, +INF] MASK 0xffffffffffffc07d 
> VALUE 0x0
>   # i_20 = PHI <_3(4), 0(3), _6(2)>
> and evrp is now trying to figure out range for i_20 in range_of_phi.
>
> All the ranges and MASK/VALUE pairs above are correct for the testcase,
> k_11 and _2 based on it is a result of multiplication by a constant with low
> 14 bits cleared and then some numbers are added to it.
>
> There is an obvious missed optimization for which I've filed PR119039,
> simplify_switch_using_ranges could see that all the labels but default
> are unreachable because the controlling expression has
> MASK 0xffffc000 VALUE 0x34 and none of 8, 24 and 32 satisfy that.
>
> Anyway, during range_of_phi for i_20, we process the PHI arguments
> in order.  For the _3(4) case, we figure out that it is reachable
> through the case 24: case 32: labels only of the switch and that
> 0x34 - 0x2d is 7, so derive
> [irange] long long unsigned int [17, 17][25, 25] MASK 0xffffffffffffc000 
> VALUE 0x2d
> (the MASK/VALUE just got inherited from the _3 earlier range).
> Now (not suprisingly because those labels aren't actually reachable),
> that range is inconsistent, 0x2d is 45, so there is conflict between the
> values and the irange_bitmask.
> value-range.{h,cc} code differentiates between actually stored
> irange_bitmask, which is that MASK 0xffffffffffffc000 VALUE 0x2d, and
> semantic bitmask, which is what get_bitmask returns.  That is
>   // The mask inherent in the range is calculated on-demand.  For
>   // example, [0,255] does not have known bits set by default.  This
>   // saves us considerable time, because setting it at creation incurs
>   // a large penalty for irange::set.  At the time of writing there
>   // was a 5% slowdown in VRP if we kept the mask precisely up to date
>   // at all times.  Instead, we default to -1 and set it when
>   // explicitly requested.  However, this function will always return
>   // the correct mask.
>   //
>   // This also means that the mask may have a finer granularity than
>   // the range and thus contradict it.  Think of the mask as an
>   // enhancement to the range.  For example:
>   //
>   // [3, 1000] MASK 0xfffffffe VALUE 0x0
>   //
>   // 3 is in the range endpoints, but is excluded per the known 0 bits
>   // in the mask.
>   //
>   // See also the note in irange_bitmask::intersect.
>   irange_bitmask bm
>     = get_bitmask_from_range (type (), lower_bound (), upper_bound ());
>   if (!m_bitmask.unknown_p ())
>     bm.intersect (m_bitmask);
> Now, get_bitmask_from_range here is MASK 0x1f VALUE 0x0 and it intersects
> that with that MASK 0xffffffffffffc000 VALUE 0x2d.
> Which triggers the ugly special case in irange_bitmask::intersect:
>   // If we have two known bits that are incompatible, the resulting
>   // bit is undefined.  It is unclear whether we should set the entire
>   // range to UNDEFINED, or just a subset of it.  For now, set the
>   // entire bitmask to unknown (VARYING).
>   if (wi::bit_and (~(m_mask | src.m_mask),
>                    m_value ^ src.m_value) != 0)
>     {
>       unsigned prec = m_mask.get_precision ();
>       m_mask = wi::minus_one (prec);
>       m_value = wi::zero (prec);
>     }
> so the semantic bitmask is actually MASK 0xffffffffffffffff VALUE 0x0.
>
> Next, range_of_phi attempts to union it with the 0(3) PHI argument,
> and during irange::union_ first adds the [0,0] to the subranges, so
> [irange] long long unsigned int [0, 0][17, 17][25, 25] MASK 
> 0xffffffffffffc000 VALUE 0x2d
> and then goes on to irange::union_bitmask which does
>   if (m_bitmask == r.m_bitmask)
>     return false;
>   irange_bitmask bm = get_bitmask ();
>   irange_bitmask save = bm;
>   bm.union_ (r.get_bitmask ());
>   if (save == bm)
>     return false;
>   m_bitmask = bm;
>   if (save == get_bitmask ())
>     return false;
> m_bitmask MASK 0xffffffffffffc000 VALUE 0x2d isn't the same as
> r.m_bitmask MASK 0x0 VALUE 0x0, so we compute the semantic bitmask
> (but note, not from the original range before union, but the modified one,
> dunno if that isn't a problem as well), which is still the VARYING/unknown_p
> one, union_ that with MASK 0x0 VALUE 0x0 and get still
> MASK 0xffffffffffffffff VALUE 0x0, so don't update anything, the semantic
> bitmask didn't change, so we are fine (not!, see later).
>
> Except then we try to union with the third PHI argument.  And, because the
> edge to that comes only from case 8: label and there is a known difference
> between the two, the argument is actually already from earlier replaced by
> 45(2) constant.  So, irange::union_ adds the [45, 45] range to the list
> of subranges, but voila, 45 is 0x2d and satisfies the stored
> MASK 0xffffffffffffc000 VALUE 0x2d and so the semantic bitmask changed to
> from MASK 0xffffffffffffffff VALUE 0x0 to MASK 0xffffffffffffc000 VALUE 0x2d
> by that addition.  Eventually, we just optimize this to
> [irange] long long unsigned int [45, 45] because that is the only range
> which satisfies the bitmask.  And that is wrong, at runtime i_20 has
> value 0.
>
> The following patch attempts to detect this case where get_bitmask
> turns some non-VARYING m_bitmask into VARYING one because of a conflict
> and in that case makes sure m_bitmask is actually updated rather than
> unmodified, so that later union_ doesn't cause problems.
>
> I also wonder whether e.g. get_bitmask couldn't have special case for this
> and if bm.intersect (m_bitmask); yields unknown_p from something not
> originally unknown_p, perhaps chooses to just use get_bitmask_from_range
> value and ignore the stored m_bitmask.  Though, dunno how union_bitmask
> in that case would figure out it needs to update m_bitmask.
>
> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk
> and 14.3 (after a while)?

LGTM.  Thanks for doing a deep dive here.

>
> 2025-02-27  Jakub Jelinek  <ja...@redhat.com>
>
>       PR tree-optimization/118953
>       * value-range.cc (irange::union_bitmask): Update m_bitmask if
>       get_bitmask () is unknown_p and m_bitmask is not even when the
>       semantic bitmask didn't change and returning false.
>
>       * gcc.dg/torture/pr118953.c: New test.
>
> --- gcc/value-range.cc.jj     2025-01-02 11:23:25.118396777 +0100
> +++ gcc/value-range.cc        2025-02-26 18:49:10.713107905 +0100
> @@ -2447,7 +2447,7 @@ irange::union_bitmask (const irange &r)
>    irange_bitmask bm = get_bitmask ();
>    irange_bitmask save = bm;
>    bm.union_ (r.get_bitmask ());
> -  if (save == bm)
> +  if (save == bm && (!bm.unknown_p () || m_bitmask.unknown_p ()))
>      return false;

I wonder if it's better to swap the operands of &&, since the equality
operator is more expensive than unknown_p.  Either way, it's fine.

Aldy

>  
>    m_bitmask = bm;
> --- gcc/testsuite/gcc.dg/torture/pr118953.c.jj        2025-02-26 
> 18:51:47.139936059 +0100
> +++ gcc/testsuite/gcc.dg/torture/pr118953.c   2025-02-26 18:52:33.078298359 
> +0100
> @@ -0,0 +1,42 @@
> +/* PR tree-optimization/118953 */
> +/* { dg-do run { target int32plus } } */
> +
> +int a, d;
> +long long b, c;
> +
> +int
> +foo (int f, int g, unsigned long long h, long long j)
> +{
> +  unsigned long long i = 0;
> +  if (g)
> +    switch (f)
> +      {
> +      case 8:
> +     i = b;
> +     break;
> +      case 6:
> +     i = c;
> +     break;
> +      }
> +  else
> +    switch (f)
> +      {
> +      case 8:
> +     i = h;
> +     break;
> +      case 24:
> +      case 32:
> +     i = j;
> +     break;
> +      }
> +  return i;
> +}
> +
> +int
> +main ()
> +{
> +  int k = a * (409628 - 28);
> +  d = foo (k - 1048524, 0, k - 1048487, k - 1048531ULL);
> +  if (d)
> +    __builtin_abort ();
> +}
>
>       Jakub

Reply via email to