https://gcc.gnu.org/bugzilla/show_bug.cgi?id=121059
--- Comment #14 from rguenther at suse dot de <rguenther at suse dot de> --- On Mon, 14 Jul 2025, rsandifo at gcc dot gnu.org wrote: > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=121059 > > --- Comment #10 from Richard Sandiford <rsandifo at gcc dot gnu.org> --- > (In reply to Richard Biener from comment #9) > > vectorizable_operation during transform does > > > > /* When combining two masks check if either of them is elsewhere > > combined with a loop mask, if that's the case we can mark that > > the > > new combined mask doesn't need to be combined with a loop mask. > > */ > > if (masked_loop_p > > && code == BIT_AND_EXPR > > && VECTOR_BOOLEAN_TYPE_P (vectype)) > > { > > if (loop_vinfo->scalar_cond_masked_set.contains ({ op0, 1 })) > > { > > mask = vect_get_loop_mask (loop_vinfo, gsi, masks, > > vec_num, vectype, i); > > > > but that's not reflected by analysis, which misses to record a loop mask > > for !mask_out_inactive operations. So the fix is as simple as the > > following, > > but this might put us to using masks? There is no good way to do this > > I guess. The scalar_cond_masked_set optimization does not have a > > corresponding > > len operation. I'm not sure what we can do here? > > > > diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc > > index 4aa69da2218..55002bd0cc2 100644 > > --- a/gcc/tree-vect-stmts.cc > > +++ b/gcc/tree-vect-stmts.cc > > @@ -6978,6 +6978,16 @@ vectorizable_operation (vec_info *vinfo, > > LOOP_VINFO_CAN_USE_PARTIAL_VECTORS_P (loop_vinfo) = false; > > } > > } > > + else if (loop_vinfo > > + && LOOP_VINFO_CAN_USE_PARTIAL_VECTORS_P (loop_vinfo) > > + && code == BIT_AND_EXPR > > + && VECTOR_BOOLEAN_TYPE_P (vectype)) > > + vect_record_loop_mask (loop_vinfo, masks, vec_num, vectype, NULL); > > > > /* Put types on constant and invariant SLP children. */ > > if (!vect_maybe_update_slp_op_vectype (slp_op0, vectype) > Yeah, we shouldn't do that. The question is why op0 is in > scalar_cond_masked_set with masked_loop_p true if there's no associated loop > mask. Possibly because with AVX512 the "sharing" of masks doesn't work the same way as for SVE? I have meanwhile pushed the above with added && !masks.is_empty (). But with your remark I'm not sure that's the correct fix. See how vect_get_loop_mask distinguishes LOOP_VINFO_PARTIAL_VECTORS_STYLE between vect_partial_vectors_while_ult and vect_partial_vectors_avx512. For the testcase I get vector(8):1 as mask but the mask operands are vector(16):1