https://gcc.gnu.org/bugzilla/show_bug.cgi?id=121059
Richard Biener <rguenth at gcc dot gnu.org> changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |rsandifo at gcc dot gnu.org --- Comment #9 from Richard Biener <rguenth at gcc dot gnu.org> --- vectorizable_operation during transform does /* When combining two masks check if either of them is elsewhere combined with a loop mask, if that's the case we can mark that the new combined mask doesn't need to be combined with a loop mask. */ if (masked_loop_p && code == BIT_AND_EXPR && VECTOR_BOOLEAN_TYPE_P (vectype)) { if (loop_vinfo->scalar_cond_masked_set.contains ({ op0, 1 })) { mask = vect_get_loop_mask (loop_vinfo, gsi, masks, vec_num, vectype, i); but that's not reflected by analysis, which misses to record a loop mask for !mask_out_inactive operations. So the fix is as simple as the following, but this might put us to using masks? There is no good way to do this I guess. The scalar_cond_masked_set optimization does not have a corresponding len operation. I'm not sure what we can do here? diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc index 4aa69da2218..55002bd0cc2 100644 --- a/gcc/tree-vect-stmts.cc +++ b/gcc/tree-vect-stmts.cc @@ -6978,6 +6978,16 @@ vectorizable_operation (vec_info *vinfo, LOOP_VINFO_CAN_USE_PARTIAL_VECTORS_P (loop_vinfo) = false; } } + else if (loop_vinfo + && LOOP_VINFO_CAN_USE_PARTIAL_VECTORS_P (loop_vinfo) + && code == BIT_AND_EXPR + && VECTOR_BOOLEAN_TYPE_P (vectype)) + vect_record_loop_mask (loop_vinfo, masks, vec_num, vectype, NULL); /* Put types on constant and invariant SLP children. */ if (!vect_maybe_update_slp_op_vectype (slp_op0, vectype)