On 07/11/2025 13:42, Richard Biener wrote:
On Wed, 5 Nov 2025, Christopher Bazley wrote:

On 28/10/2025 13:29, Richard Biener wrote:
On Tue, 28 Oct 2025, Christopher Bazley wrote:

+tree
+vect_slp_get_bb_mask (slp_tree slp_node, gimple_stmt_iterator *gsi,
+                     unsigned int nvectors, tree vectype, unsigned int index)
+{
+  gcc_checking_assert (SLP_TREE_CAN_USE_MASK_P (slp_node));
+
+  /* Only the last vector can be a partial vector.  */
+  if (index < nvectors - 1)
+    return NULL_TREE;
+
+  /* vect_get_num_copies only allows a partial vector if it is the only
+     vector.  */
+  if (nvectors > 1)
+    return NULL_TREE;
+
+  gcc_checking_assert (nvectors == 1);
+
+  poly_uint64 nunits = TYPE_VECTOR_SUBPARTS (vectype);
+  unsigned int group_size = SLP_TREE_LANES (slp_node);
In particular I think that in general with partial vectors the group_size
is not equal to the number of scalar lanes but instead is computed by
the "VF", thus equal to max_nunits?  This means we have to be careful
what we deal with or rather what we want to record, given locally for
a SLP node we can only compute it's own total nunits based on the
number of scalar lanes and the vector type.
group_size comes from scalar_stmts.length () or ops.length () in the
overloaded function vect_create_new_slp_node or from TYPE_VECTOR_SUBPARTS
(SLP_TREE_VECTYPE (vnode)) in vect_build_slp_tree_2.

LOOP_VINFO_VECT_FACTOR ("VF") is only stored for the loop vectoriser
instances, therefore it cannot affect BB SLP. The vect_slp_get_bb_mask
function is only used for BB SLP.

The main use of nunits.max is in calculate_unrolling_factor, which does not
require it to be equal to group_size or an integral multiple of group_size,
nor vice-versa. The design of that function implies that nunits.max is
expected to be divisible by group_size though, so I guess it does something
like VF = group_size * nunits.max.

So we might want to explicitly record the group size.  In any case
Sorry but I'm not sure what you are suggesting here. The group size is already
explicitly recorded in the SLP node (as 'lanes', although the term
'group_size' seems to be overloaded in the vectoriser --  e.g., group_size can
also be DR_GROUP_SIZE in vectorizable_load or vectorizable_store).

'nvectors' should be also correct here, the question is how we
compute that right now.
nvectors is vector_unroll_factor (i.e. SLP_TREE_LANES / simdlen for BB SLP) in
vectorizable_simd_clone_call, vect_get_num_copies (i.e. SLP_TREE_LANES /
TYPE_VECTOR_SUBPARTS for BB SLP) in vectorizable_operation, and
vect_get_num_copies / DR_GROUP_SIZE in vectorizable_{store|load}. The
vect_get_mask function is also called with values of nvectors between 0
and vect_get_num_copies.

I assumed that arguments that are valid for vect_get_loop_mask would also be
valid for my new function, vect_slp_get_bb_mask, because my intention was
always to share as much code as possible between loop vectorisation and SLP.
It's likely that some of the code is not optimal as a result.
Well yes, I know all this.  But when we now add padding, the question
is where we should track that (or, as you seem to imply, not track it).
Do we increase group_size to reflect that the actual vectors have more
lanes?  Do we just track that in max_nunits somehow?
SLP_TREE_LANES gives the unpadded size of a group (i.e. the number of active lanes), which seems reasonable to me because that is the actual size of the group. slp_tree_nunits simply gives the range of TYPE_VECTOR_SUBPARTS (vectype), i.e. including any inactive lanes in the last vector of the group. So, no, max_nunits (or rather, nunits.max in my patch) does not include padding and I didn't want/need to change it to do so.

I don't yet understand why you want the amount of padding to be tracked independently from TYPE_VECTOR_SUBPARTS (vectype) - SLP_TREE_LANES (slp_node). Even if vect_slp_get_bb_mask were modified to produce masks for partial vectors in cases where nvectors > 1, the amount of padding required (or, more usefully in this function, the number of unmasked bits) could still be derived from the vectype and group_size by using the remainder of a division, e.g., TYPE_VECTOR_SUBPARTS (vectype) - (SLP_TREE_LANES (slp_node) % TYPE_VECTOR_SUBPARTS (vectype)).

The purpose of max_nunits for BB vectorization is solely to detect
the case that we do not have sufficient lanes in the SLP node to
fill the vector lanes of the vector type we chose, thus we'd need
"unrolling".

Richard.

My understanding of calculate_unrolling_factor and its calling code in vect_analyze_slp_instance and vect_build_slp_instance is that unrolling is required ifthe group size is less than the maximum number of lanes of all of the chosen vector types, or the group size is greater than nunits.max but not exactly divisible by it. If padding lanes were included in the group size, it might prevent correct detection of when unrolling is required. (If you say that unrolling is never required for BB SLP, that would require restructuring of the control flow in vect_analyze_slp_instance because currently all of the SLP-specific code is in the "unrolling required" block.)

For BB SLP, the vectoriser used to give up completely if the group size was less than the maximum number of lanes of all of the chosen vector types ("...do not have sufficient lanes in the SLP node to fill the vector lanes of the vector type we chose..."), or split the group if the group size is greater than nunits.max but not exactly divisible by it.

If the minimum number of lanes across all of the chosen vector types is sufficient to store the whole group then it might be possible to use tail predication, which is why I added !known_ge (nunits.min, group_size) to the conjunction that must be true before entering that block. My modification does not prevent groups bigger than nelems.max from being split, including in cases where one of the new groups resulting from such a split can be handled by tail-predication.

Are you suggesting that such splits should be avoided? If so, please could you explain the rationale?

Thanks,

--
Christopher Bazley
Staff Software Engineer, GNU Tools Team.
Arm Ltd, 110 Fulbourn Road, Cambridge, CB1 9NJ, UK.
http://www.arm.com/

Reply via email to