https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88492
--- Comment #5 from Richard Biener <rguenth at gcc dot gnu.org> --- Yeah. Again, on x86 with -mavx2 we now have right after late FRE: <bb 2> [local count: 214748371]: vect__1.6_64 = MEM <vector(16) unsigned char> [(unsigned char *)b_24(D)]; vect__1.7_63 = VEC_PERM_EXPR <vect__1.6_64, vect__1.6_64, { 0, 2, 1, 3, 4, 6, 5, 7, 8, 10, 9, 11, 12, 14, 13, 15 }>; vect__2.9_62 = [vec_unpack_lo_expr] vect__1.7_63; vect__2.9_59 = [vec_unpack_hi_expr] vect__1.7_63; vect__2.8_57 = [vec_unpack_lo_expr] vect__2.9_62; vect__2.8_56 = [vec_unpack_hi_expr] vect__2.9_62; vect__2.8_55 = [vec_unpack_lo_expr] vect__2.9_59; vect__2.8_54 = [vec_unpack_hi_expr] vect__2.9_59; MEM <vector(4) unsigned int> [(unsigned int *)&tmp] = vect__2.8_57; MEM <vector(4) unsigned int> [(unsigned int *)&tmp + 16B] = vect__2.8_56; MEM <vector(4) unsigned int> [(unsigned int *)&tmp + 32B] = vect__2.8_55; MEM <vector(4) unsigned int> [(unsigned int *)&tmp + 48B] = vect__2.8_54; vectp_b.4_65 = b_24(D) + 16; _8 = BIT_FIELD_REF <vect__2.8_57, 32, 0>; _22 = BIT_FIELD_REF <vect__2.8_56, 32, 0>; _30 = _8 + _22; _14 = BIT_FIELD_REF <vect__2.8_55, 32, 0>; _81 = BIT_FIELD_REF <vect__2.8_54, 32, 0>; _43 = _14 + _30; _45 = _43 + _81; sum_34 = (int) _45; _58 = BIT_FIELD_REF <vect__2.8_57, 32, 32>; _38 = BIT_FIELD_REF <vect__2.8_56, 32, 32>; _72 = _38 + _58; _68 = BIT_FIELD_REF <vect__2.8_55, 32, 32>; _53 = BIT_FIELD_REF <vect__2.8_54, 32, 32>; _29 = _68 + _72; _31 = _29 + _53; _7 = _31 + _45; sum_61 = (int) _7; _47 = BIT_FIELD_REF <vect__2.8_57, 32, 64>; _88 = BIT_FIELD_REF <vect__2.8_56, 32, 64>; _44 = _47 + _88; _74 = _44 + _90; _73 = _74 + _92; _83 = _7 + _73; sum_84 = (int) _83; _94 = BIT_FIELD_REF <vect__2.8_57, 32, 96>; _96 = BIT_FIELD_REF <vect__2.8_56, 32, 96>; _71 = _94 + _96; _98 = BIT_FIELD_REF <vect__2.8_55, 32, 96>; _100 = BIT_FIELD_REF <vect__2.8_54, 32, 96>; _70 = _71 + _98; _46 = _70 + _100; _18 = _46 + _83; sum_27 = (int) _18; tmp ={v} {CLOBBER}; return sum_27; I know the "real" code this testcase is from has actual operations in place of the b[N] reads, for the above vectorization looks somewhat pointless given we end up decomposing the result again. So the appropriate fix would of course be to vectorize the reduction loop (but that hits the sign-changing reduction issue).