https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70354
--- Comment #8 from Jakub Jelinek <jakub at gcc dot gnu.org> --- About the shifting of negative value, wonder if it isn't actually UBSAN bug, Marek, does C really say that -1 << 0 is invalid, or just -1 << 1? But, it is trivial to change the testcase so that it is not questionable: long long int b[64], c[64], g[64]; unsigned long long int a[64], d[64], e[64], f[64], h[64]; __attribute__ ((noinline, noclone)) void foo (void) { int i; for (i = 0; i < 64; i++) { d[i] = h[i] << (((((unsigned long long int) b[i] * e[i]) << (-a[i] - 3752448776177690134ULL)) - 8214565720323784703ULL) - 1ULL); e[i] = (_Bool) (f[i] + (unsigned long long int) g[i]); g[i] = c[i]; } } int main () { int i; for (i = 0; i < 64; ++i) { a[i] = 14694295297531861425ULL; b[i] = -1725558902283030715LL; c[i] = 4402992416302558097LL; e[i] = 6297173129107286501ULL; f[i] = 13865724171235650855ULL; g[i] = 982871027473857427LL; h[i] = 8193845517487445944ULL; } foo (); for (i = 0; i < 64; i++) if (d[i] != 8193845517487445944ULL || e[i] != 1 || g[i] != 4402992416302558097ULL) __builtin_abort (); return 0; } I don't see any UB there. The first shift left is << 57, which gives 0x7200000000000000ULL and we subtract the same number from it and get 0, so the second shift is << 0. The bug is in the vectorizer (or pattern detection), where we had in scalar code: _9 = a[i_29]; _10 = (unsigned int) _9; _11 = 705976810 - _10; _12 = _8 << _11; _13 = (unsigned int) _12; _14 = _4 << _13; in the vectorized code we have: vect__9.20_26 = MEM[(long long unsigned int *)vectp_a.18_28]; vectp_a.18_2 = vectp_a.18_28 + 16; vect__9.21_1 = MEM[(long long unsigned int *)vectp_a.18_2]; _9 = a[i_29]; vect__10.22_56 = VEC_PACK_TRUNC_EXPR <vect__9.20_26, vect__9.21_1>; _10 = (unsigned int) _9; vect__11.23_58 = vect_cst__57 - vect__10.22_56; _11 = 705976810 - _10; vect_patt_53.24_59 = [vec_unpack_lo_expr] vect__11.23_58; vect_patt_53.24_60 = [vec_unpack_hi_expr] vect__11.23_58; vect_patt_52.25_61 = vect__8.17_32 << vect_patt_53.24_59; vect_patt_52.25_62 = vect__8.17_31 << vect_patt_53.24_60; _12 = _8 << _11; _13 = (unsigned int) _12; vect_patt_51.26_63 = vect__4.6_47 << vect_patt_52.25_61; vect_patt_51.26_64 = vect__4.7_45 << vect_patt_52.25_62; _14 = _4 << _13; As you can see, the formed (unsigned int) cast is lost, we really need to shift just by low 32 bits of the 64-bit result, not all bits thereof (the subtraction got optimized away earlier, as the shift counts are 32-bit). I'll have a look.