Re: [RFC 7/9] Update constant creation for BB SLP with predicated tails

Richard Biener Tue, 11 Nov 2025 04:56:48 -0800

On Fri, 7 Nov 2025, Christopher Bazley wrote:

> 
> On 07/11/2025 13:35, Richard Biener wrote:
> > On Wed, 5 Nov 2025, Christopher Bazley wrote:
> >
> >> On 28/10/2025 13:51, Richard Biener wrote:
> >>> On Tue, 28 Oct 2025, Christopher Bazley wrote:
> >>>
> >>>> vect_create_constant_vectors is updated to pad with zeros
> >>>> between the end of a group and the end of a vector of the type
> >>>> chosen for the SLP node, when used for BB SLP. This function
> >>>> calls gimple_build_vector, which also has to be updated for
> >>>> SVE vector types (by using the lower bound as the number of
> >>>> elements, e.g., 16 for VNx16QI).
> >>>> ---
> >>>>    gcc/gimple-fold.cc   |  2 +-
> >>>>    gcc/tree-vect-slp.cc | 43 +++++++++++++++++++++++++++++++++++--------
> >>>>    2 files changed, 36 insertions(+), 9 deletions(-)
> >>>>
> >>>> diff --git a/gcc/gimple-fold.cc b/gcc/gimple-fold.cc
> >>>> index edcc04adc08..e5fe0ea12a7 100644
> >>>> --- a/gcc/gimple-fold.cc
> >>>> +++ b/gcc/gimple-fold.cc
> >>>> @@ -11275,7 +11275,7 @@ gimple_build_vector (gimple_stmt_iterator *gsi,
> >>>>           {
> >>>>     gimple_seq seq = NULL;
> >>>>     tree type = builder->type ();
> >>>> -        unsigned int nelts = TYPE_VECTOR_SUBPARTS (type).to_constant ();
> >>>> +        unsigned int nelts = constant_lower_bound (TYPE_VECTOR_SUBPARTS
> >>>> (type));
> >>> I don't think this is desirable in a generic helper?  How does
> >>> the 'builder' class deal with the all-constant case?  It seems
> >>> handling for constant vs. non-constant will now differ semantically
> >>> (instead of ICEing in one case previously).
> >> This was the most minimal change I could make to get the feature working
> >> (whilst debugging many other issues) and it seemed harmless to me, so I
> >> didn't
> >> spend much time thinking about it.
> >>
> >> I know very little about the builder, but my understanding is that it would
> >> behave as though elements beyond the lower bound do not exist. e.g., if the
> >> vector type is VNx16QI then TREE_CONSTANT would return true for the
> >> CONSTRUCTOR node created by build_constructor if elements 0..15 are
> >> constant.
> >>
> >> This is presumably not safe general-purpose behaviour, because it would
> >> leave
> >> any other elements uninitialised (which does not matter for my use-case). I
> >> have no objection to trying to solve this elsewhere (probably in
> >> vect_create_constant_vectors) but I'll first need to revert this change and
> >> remind myself what breaks.
> > Fixing this upthread would be definitely better.  Not sure exactly how.
> > Alternatively the change could be done in a way to assert that
> > the tree_vector_builder has less than or exactly the same number
> > of elements as constant_lower_bound of nelts.  I don't exactly
> > remember what the builder tracks here and what constraints for
> > initialization of VLA vectors are.
> 
> I've done some further investigation.
> 
> One of the tests that failed without my change to gimple_build_vector was
> gcc.target/aarch64/sve/slp_6.c. I made that change to enable building of the
> following constant (among others):
> 
> _70 = {_85, _21, _55, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0};
> 
> That constant has only 16 elements although the type of _70 is vector([16,16])
> unsigned char:
> 
> void vec_slp_int8_t (int8_t * restrict a, int8_t * restrict b, int n)
> {
> ...
> 
>   vector([16,16]) signed char vect_x0_43.58;
>   vector([16,16]) signed char vect__90.57;
>   vector([16,16]) unsigned char vect__89.56;
>   vector([16,16]) unsigned char vect__87.55;
>   vector([16,16]) signed char vect_x0_26.54;
>   vector([16,16]) signed char vect_x0_34.47;
> 
> ...
> 
>   vector([16,16]) signed char vect_x1_35.41;
> 
> ...
> 
>   vector([16,16]) signed char vect_x2_36.35;
> 
> ...
> 
>   void * _8;
>   vector([16,16]) signed char[3] * _9;
> 
> ...
> 
>   unsigned char _21;
>   vector([16,16]) unsigned char _22;
>   unsigned char _55;
>   vector([16,16]) unsigned char _56;
> 
> ...
> 
>   vector([16,16]) unsigned char _70;
>   vector([16,16]) unsigned char _84;
>   unsigned char _85;
> 
> ...
> 
>   <bb 5> [local count: 105119324]:
>   _84 = (vector([16,16]) unsigned char) vect_x0_34.47_82;
>   _85 = .REDUC_PLUS (_84);
>   _22 = (vector([16,16]) unsigned char) vect_x1_35.41_38;
>   _21 = .REDUC_PLUS (_22);
>   _56 = (vector([16,16]) unsigned char) vect_x2_36.35_58;
>   _55 = .REDUC_PLUS (_56);
>   _70 = {_85, _21, _55, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0};
>   vect__89.56_3 = vect__87.55_2 + _70;
>   vect__90.57_4 = VIEW_CONVERT_EXPR<vector([16,16]) signed 
> char>(vect__89.56_3);
> 
>   <bb 6> [local count: 118111600]:
>   # vect_x0_43.58_5 = PHI <vect__90.57_4(5), vect_x0_26.54_1(8)>
>   .MASK_STORE (b_25(D), 8B, { -1, -1, -1, 0, 0, 0, 0, 0, ... }, 
> vect_x0_43.58_5); [tail call]
>   return;
> 
> The compiled code looks correct, although movi d0,#0 only zeros the first 16
> bytes of the variable-length vector constant:
> 
>     addvl    x0, sp, #2
>     movi    d0, #0
>     st1b    z0.b, p6, [sp, #2, mul vl]
>     uaddv    d27, p6, z27.b
>     uaddv    d26, p6, z26.b
>     uaddv    d25, p6, z25.b
>     str    b27, [x0]
>     addvl    x0, sp, #1
>     add    x0, x0, 1
>     ptrue    p7.b, vl3
>     ld1b    z0.b, p6/z, [sp, #2, mul vl]
>     st1b    z0.b, p6, [sp, #1, mul vl]
>     str    b26, [x0]
>     ld1b    z0.b, p6/z, [sp, #1, mul vl]
>     st1b    z0.b, p6, [sp]
>     str    b25, [sp, 2]
>     ld1b    z0.b, p6/z, [sp]
>     add    z28.b, z0.b, z28.b
>     st1b    z28.b, p7, [x1]
>     addvl    sp, sp, #3
>     .cfi_def_cfa_offset 0
>     ret
> 
> (This code has already been noted to be inefficient, which I plan to address
> separately.)
> 
> The decision about how many bytes to zero is made in the calling function,
> vect_create_constant_vectors (which also uses constant_lower_bound), rather
> than in gimple_build_vector:
> 
>   unsigned int elt_count = group_size;
>   if (is_a<bb_vec_info> (vinfo))
>     {
>       /* We don't use duplicate_and_interleave for basic block 
> vectorization.
>     We know that the group size fits within a single vector, so all we need
>     to do for VLA is to pad the constant to the minimum vector length.  */
>       nunits = constant_lower_bound (TYPE_VECTOR_SUBPARTS (vector_type));
>       elt_count = MAX (nunits, group_size);
>     }
> 
> My current understanding is that you don't object to this part of my change.
> Whatever happens in gimple_build_vector won’t alter the fact that only the
> minimum number of bytes are zeroed, and in most cases that’s the desirable
> outcome.
> 
> I therefore plan to keep my modification to gimple_build_vector, but add an
> assertion that builder->encoded_nelts () <= constant_lower_bound
> (TYPE_VECTOR_SUBPARTS (builder->type ())) so that the modified function never
> builds fewer elements than expected when one of them is non-constant. Would
> that be OK?


I'm not sure builder->encoded_nelts () is the correct thing to check
here.  In particular any stepped encoding should be rejected as well,
so nelts_per_pattern () must be <= 2.  And even then the interpretation
is then to fill with the last value IIRC, and as you get zero-filling
with building a "short" CTOR that last element should be a zero.  I'm
not sure how to get at the "last" value, but I think that given
we create a "short" CTOR we need to check that all remaining
elements of the VLA vector are actually encoded as zeros?

I hope Richard can give us a hint at what's the correct thing to do
there.  In principle creating a fixed-size vector CTOR and then
using a VEC_PERM to fill that with zeros like VEC_PERM <fixed-length,
VLA zero, { 0, 1, 2, 3, 4, 5, 6 ... }> would be that, or simply
a V_C_E of the fixed-length vector to the VLA vector.

Richard.

-- 
Richard Biener <[email protected]>
SUSE Software Solutions Germany GmbH,
Frankenstrasse 146, 90461 Nuernberg, Germany;
GF: Jochen Jaser, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)

Re: [RFC 7/9] Update constant creation for BB SLP with predicated tails

Reply via email to