On 07/11/2025 13:35, Richard Biener wrote:
On Wed, 5 Nov 2025, Christopher Bazley wrote:
On 28/10/2025 13:51, Richard Biener wrote:
On Tue, 28 Oct 2025, Christopher Bazley wrote:
vect_create_constant_vectors is updated to pad with zeros
between the end of a group and the end of a vector of the type
chosen for the SLP node, when used for BB SLP. This function
calls gimple_build_vector, which also has to be updated for
SVE vector types (by using the lower bound as the number of
elements, e.g., 16 for VNx16QI).
---
gcc/gimple-fold.cc | 2 +-
gcc/tree-vect-slp.cc | 43 +++++++++++++++++++++++++++++++++++--------
2 files changed, 36 insertions(+), 9 deletions(-)
diff --git a/gcc/gimple-fold.cc b/gcc/gimple-fold.cc
index edcc04adc08..e5fe0ea12a7 100644
--- a/gcc/gimple-fold.cc
+++ b/gcc/gimple-fold.cc
@@ -11275,7 +11275,7 @@ gimple_build_vector (gimple_stmt_iterator *gsi,
{
gimple_seq seq = NULL;
tree type = builder->type ();
- unsigned int nelts = TYPE_VECTOR_SUBPARTS (type).to_constant ();
+ unsigned int nelts = constant_lower_bound (TYPE_VECTOR_SUBPARTS
(type));
I don't think this is desirable in a generic helper? How does
the 'builder' class deal with the all-constant case? It seems
handling for constant vs. non-constant will now differ semantically
(instead of ICEing in one case previously).
This was the most minimal change I could make to get the feature working
(whilst debugging many other issues) and it seemed harmless to me, so I
didn't
spend much time thinking about it.
I know very little about the builder, but my understanding is that it would
behave as though elements beyond the lower bound do not exist. e.g., if the
vector type is VNx16QI then TREE_CONSTANT would return true for the
CONSTRUCTOR node created by build_constructor if elements 0..15 are
constant.
This is presumably not safe general-purpose behaviour, because it would
leave
any other elements uninitialised (which does not matter for my use-case). I
have no objection to trying to solve this elsewhere (probably in
vect_create_constant_vectors) but I'll first need to revert this change and
remind myself what breaks.
Fixing this upthread would be definitely better. Not sure exactly how.
Alternatively the change could be done in a way to assert that
the tree_vector_builder has less than or exactly the same number
of elements as constant_lower_bound of nelts. I don't exactly
remember what the builder tracks here and what constraints for
initialization of VLA vectors are.
I've done some further investigation.
One of the tests that failed without my change to gimple_build_vector was
gcc.target/aarch64/sve/slp_6.c. I made that change to enable building of the
following constant (among others):
_70 = {_85, _21, _55, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0};
That constant has only 16 elements although the type of _70 is vector([16,16])
unsigned char:
void vec_slp_int8_t (int8_t * restrict a, int8_t * restrict b, int n)
{
...
vector([16,16]) signed char vect_x0_43.58;
vector([16,16]) signed char vect__90.57;
vector([16,16]) unsigned char vect__89.56;
vector([16,16]) unsigned char vect__87.55;
vector([16,16]) signed char vect_x0_26.54;
vector([16,16]) signed char vect_x0_34.47;
...
vector([16,16]) signed char vect_x1_35.41;
...
vector([16,16]) signed char vect_x2_36.35;
...
void * _8;
vector([16,16]) signed char[3] * _9;
...
unsigned char _21;
vector([16,16]) unsigned char _22;
unsigned char _55;
vector([16,16]) unsigned char _56;
...
vector([16,16]) unsigned char _70;
vector([16,16]) unsigned char _84;
unsigned char _85;
...
<bb 5> [local count: 105119324]:
_84 = (vector([16,16]) unsigned char) vect_x0_34.47_82;
_85 = .REDUC_PLUS (_84);
_22 = (vector([16,16]) unsigned char) vect_x1_35.41_38;
_21 = .REDUC_PLUS (_22);
_56 = (vector([16,16]) unsigned char) vect_x2_36.35_58;
_55 = .REDUC_PLUS (_56);
_70 = {_85, _21, _55, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0};
vect__89.56_3 = vect__87.55_2 + _70;
vect__90.57_4 = VIEW_CONVERT_EXPR<vector([16,16]) signed
char>(vect__89.56_3);
<bb 6> [local count: 118111600]:
# vect_x0_43.58_5 = PHI <vect__90.57_4(5), vect_x0_26.54_1(8)>
.MASK_STORE (b_25(D), 8B, { -1, -1, -1, 0, 0, 0, 0, 0, ... },
vect_x0_43.58_5); [tail call]
return;
The compiled code looks correct, although movi d0,#0 only zeros the first 16
bytes of the variable-length vector constant:
addvl x0, sp, #2
movi d0, #0
st1b z0.b, p6, [sp, #2, mul vl]
uaddv d27, p6, z27.b
uaddv d26, p6, z26.b
uaddv d25, p6, z25.b
str b27, [x0]
addvl x0, sp, #1
add x0, x0, 1
ptrue p7.b, vl3
ld1b z0.b, p6/z, [sp, #2, mul vl]
st1b z0.b, p6, [sp, #1, mul vl]
str b26, [x0]
ld1b z0.b, p6/z, [sp, #1, mul vl]
st1b z0.b, p6, [sp]
str b25, [sp, 2]
ld1b z0.b, p6/z, [sp]
add z28.b, z0.b, z28.b
st1b z28.b, p7, [x1]
addvl sp, sp, #3
.cfi_def_cfa_offset 0
ret
(This code has already been noted to be inefficient, which I plan to address
separately.)
The decision about how many bytes to zero is made in the calling function,
vect_create_constant_vectors (which also uses constant_lower_bound), rather
than in gimple_build_vector:
unsigned int elt_count = group_size;
if (is_a<bb_vec_info> (vinfo))
{
/* We don't use duplicate_and_interleave for basic block
vectorization.
We know that the group size fits within a single vector, so all we need
to do for VLA is to pad the constant to the minimum vector length. */
nunits = constant_lower_bound (TYPE_VECTOR_SUBPARTS (vector_type));
elt_count = MAX (nunits, group_size);
}
My current understanding is that you don't object to this part of my change.
Whatever happens in gimple_build_vector won’t alter the fact that only the
minimum number of bytes are zeroed, and in most cases that’s the desirable
outcome.
I therefore plan to keep my modification to gimple_build_vector, but add an
assertion that builder->encoded_nelts () <= constant_lower_bound
(TYPE_VECTOR_SUBPARTS (builder->type ())) so that the modified function never
builds fewer elements than expected when one of them is non-constant. Would
that be OK?