https://gcc.gnu.org/bugzilla/show_bug.cgi?id=121049
--- Comment #15 from Richard Biener <rguenth at gcc dot gnu.org> --- I think the issue is that we do _79 = _78 > { 0, 1, 2, 3, 4, 5, 6, 7 }; vect__12.20_57 = .MASK_LOAD (vectp_mon_lengths.19_51, 256B, _79, { 0, 0, 0, 0, 0, 0, 0, 0 }); vect_patt_18.21_58 = WIDEN_MULT_EVEN_EXPR <vect__12.20_57, { 2, 2, 2, 2, 2, 2, 2, 2 }>; vect_patt_18.21_59 = WIDEN_MULT_ODD_EXPR <vect__12.20_57, { 2, 2, 2, 2, 2, 2, 2, 2 }>; _63 = VIEW_CONVERT_EXPR<vector(4) <signed-boolean:1>>(_79); vect_value_4.23_64 = .COND_ADD (_63, vect_patt_18.21_58, _60, _60); _65 = VIEW_CONVERT_EXPR<unsigned char>(_79); _66 = _65 >> 4; _67 = VIEW_CONVERT_EXPR<vector(4) <signed-boolean:1>>(_66); vect_value_4.23_68 = .COND_ADD (_67, vect_patt_18.21_59, vect_value_4.23_64, vect_value_4.23_64); so we use an even/odd widen mult for the reduction - which ultimatively is OK, but when we do loop masking it is not, since we the mask the wrong elements. We'd need an lo/hi widen mult or alternatively do an even/odd extract of the loop mask instead of taking the lo/hi when distributing.