https://gcc.gnu.org/bugzilla/show_bug.cgi?id=121059

--- Comment #14 from rguenther at suse dot de <rguenther at suse dot de> ---
On Mon, 14 Jul 2025, rsandifo at gcc dot gnu.org wrote:

> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=121059
> 
> --- Comment #10 from Richard Sandiford <rsandifo at gcc dot gnu.org> ---
> (In reply to Richard Biener from comment #9)
> > vectorizable_operation during transform does
> > 
> >           /* When combining two masks check if either of them is elsewhere
> >              combined with a loop mask, if that's the case we can mark that
> > the
> >              new combined mask doesn't need to be combined with a loop mask.
> > */
> >           if (masked_loop_p
> >               && code == BIT_AND_EXPR
> >               && VECTOR_BOOLEAN_TYPE_P (vectype))
> >             {
> >               if (loop_vinfo->scalar_cond_masked_set.contains ({ op0, 1 }))
> >                 {
> >                   mask = vect_get_loop_mask (loop_vinfo, gsi, masks,
> >                                              vec_num, vectype, i);
> > 
> > but that's not reflected by analysis, which misses to record a loop mask
> > for !mask_out_inactive operations.  So the fix is as simple as the 
> > following,
> > but this might put us to using masks?  There is no good way to do this
> > I guess.  The scalar_cond_masked_set optimization does not have a
> > corresponding
> > len operation.  I'm not sure what we can do here?
> > 
> > diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc
> > index 4aa69da2218..55002bd0cc2 100644
> > --- a/gcc/tree-vect-stmts.cc
> > +++ b/gcc/tree-vect-stmts.cc
> > @@ -6978,6 +6978,16 @@ vectorizable_operation (vec_info *vinfo,
> >               LOOP_VINFO_CAN_USE_PARTIAL_VECTORS_P (loop_vinfo) = false;
> >             }
> >         }
> > +      else if (loop_vinfo
> > +              && LOOP_VINFO_CAN_USE_PARTIAL_VECTORS_P (loop_vinfo)
> > +              && code == BIT_AND_EXPR
> > +              && VECTOR_BOOLEAN_TYPE_P (vectype))
> > +       vect_record_loop_mask (loop_vinfo, masks, vec_num, vectype, NULL);
> >  
> >        /* Put types on constant and invariant SLP children.  */
> >        if (!vect_maybe_update_slp_op_vectype (slp_op0, vectype)
> Yeah, we shouldn't do that.  The question is why op0 is in
> scalar_cond_masked_set with masked_loop_p true if there's no associated loop
> mask.

Possibly because with AVX512 the "sharing" of masks doesn't work the
same way as for SVE?  I have meanwhile pushed the above with added
&& !masks.is_empty ().  But with your remark I'm not sure that's
the correct fix.  See how vect_get_loop_mask distinguishes
LOOP_VINFO_PARTIAL_VECTORS_STYLE between
vect_partial_vectors_while_ult and vect_partial_vectors_avx512.

For the testcase I get vector(8):1 as mask but the mask operands are
vector(16):1

Reply via email to