[Bug target/70354] [6 Regression] Wrong code with -O3 -march=broadwell and -march=skylake-avx512.

jakub at gcc dot gnu.org Tue, 22 Mar 2016 06:43:51 -0700

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70354


--- Comment #8 from Jakub Jelinek <jakub at gcc dot gnu.org> ---
About the shifting of negative value, wonder if it isn't actually UBSAN bug,
Marek, does C really say that -1 << 0 is invalid, or just -1 << 1?

But, it is trivial to change the testcase so that it is not questionable:
long long int b[64], c[64], g[64];
unsigned long long int a[64], d[64], e[64], f[64], h[64];

__attribute__ ((noinline, noclone)) void
foo (void)
{
  int i;
  for (i = 0; i < 64; i++)
    {
      d[i] = h[i] << (((((unsigned long long int) b[i] * e[i])
                        << (-a[i] - 3752448776177690134ULL))
                       - 8214565720323784703ULL) - 1ULL);
      e[i] = (_Bool) (f[i] + (unsigned long long int) g[i]);
      g[i] = c[i];
    }
}

int
main ()
{
  int i;
  for (i = 0; i < 64; ++i)
    {
      a[i] = 14694295297531861425ULL;
      b[i] = -1725558902283030715LL;
      c[i] = 4402992416302558097LL;
      e[i] = 6297173129107286501ULL;
      f[i] = 13865724171235650855ULL;
      g[i] = 982871027473857427LL;
      h[i] = 8193845517487445944ULL;
    }
  foo ();
  for (i = 0; i < 64; i++)
    if (d[i] != 8193845517487445944ULL || e[i] != 1
        || g[i] != 4402992416302558097ULL)
      __builtin_abort ();
  return 0;
}

I don't see any UB there.
The first shift left is << 57, which gives 0x7200000000000000ULL and we
subtract the same number from it and get 0, so the second shift is << 0.
The bug is in the vectorizer (or pattern detection), where we had in scalar
code:
  _9 = a[i_29];
  _10 = (unsigned int) _9;
  _11 = 705976810 - _10;
  _12 = _8 << _11;
  _13 = (unsigned int) _12;
  _14 = _4 << _13;
in the vectorized code we have:
  vect__9.20_26 = MEM[(long long unsigned int *)vectp_a.18_28];
  vectp_a.18_2 = vectp_a.18_28 + 16;
  vect__9.21_1 = MEM[(long long unsigned int *)vectp_a.18_2];
  _9 = a[i_29];
  vect__10.22_56 = VEC_PACK_TRUNC_EXPR <vect__9.20_26, vect__9.21_1>;
  _10 = (unsigned int) _9;
  vect__11.23_58 = vect_cst__57 - vect__10.22_56;
  _11 = 705976810 - _10;
  vect_patt_53.24_59 = [vec_unpack_lo_expr] vect__11.23_58;
  vect_patt_53.24_60 = [vec_unpack_hi_expr] vect__11.23_58;
  vect_patt_52.25_61 = vect__8.17_32 << vect_patt_53.24_59;
  vect_patt_52.25_62 = vect__8.17_31 << vect_patt_53.24_60;
  _12 = _8 << _11;
  _13 = (unsigned int) _12;
  vect_patt_51.26_63 = vect__4.6_47 << vect_patt_52.25_61;
  vect_patt_51.26_64 = vect__4.7_45 << vect_patt_52.25_62;
  _14 = _4 << _13;
As you can see, the formed (unsigned int) cast is lost, we really need to shift
just by low 32 bits of the 64-bit result, not all bits thereof (the subtraction
got optimized away earlier, as the shift counts are 32-bit).  I'll have a look.

[Bug target/70354] [6 Regression] Wrong code with -O3 -march=broadwell and -march=skylake-avx512.

Reply via email to