length for BB SLP

Christopher Bazley Wed, 05 Nov 2025 06:45:55 -0800


On 28/10/2025 13:29, Richard Biener wrote:

On Tue, 28 Oct 2025, Christopher Bazley wrote:

+tree
+vect_slp_get_bb_mask (slp_tree slp_node, gimple_stmt_iterator *gsi,
+                     unsigned int nvectors, tree vectype, unsigned int index)
+{
+  gcc_checking_assert (SLP_TREE_CAN_USE_MASK_P (slp_node));
+
+  /* Only the last vector can be a partial vector.  */
+  if (index < nvectors - 1)
+    return NULL_TREE;
+
+  /* vect_get_num_copies only allows a partial vector if it is the only
+     vector.  */
+  if (nvectors > 1)
+    return NULL_TREE;
+
+  gcc_checking_assert (nvectors == 1);
+
+  poly_uint64 nunits = TYPE_VECTOR_SUBPARTS (vectype);
+  unsigned int group_size = SLP_TREE_LANES (slp_node);

In particular I think that in general with partial vectors the group_size
is not equal to the number of scalar lanes but instead is computed by
the "VF", thus equal to max_nunits?  This means we have to be careful
what we deal with or rather what we want to record, given locally for
a SLP node we can only compute it's own total nunits based on the
number of scalar lanes and the vector type.

group_size comes from scalar_stmts.length () or ops.length () in theoverloaded function vect_create_new_slp_node or fromTYPE_VECTOR_SUBPARTS (SLP_TREE_VECTYPE (vnode)) in vect_build_slp_tree_2.

LOOP_VINFO_VECT_FACTOR ("VF") is only stored for the loop vectoriserinstances, therefore it cannot affect BB SLP. The vect_slp_get_bb_mask function is only used for BB SLP.

The main use of nunits.max is in calculate_unrolling_factor, which doesnot require it to be equal to group_size or an integral multiple ofgroup_size, nor vice-versa. The design of that function impliesthat nunits.max is expected to be divisible by group_size though, so Iguess it does something like VF = group_size * nunits.max.

So we might want to explicitly record the group size.  In any case

Sorry but I'm not sure what you are suggesting here. The group size isalready explicitly recorded in the SLP node (as 'lanes', although theterm 'group_size' seems to be overloaded in the vectoriser -- e.g.,group_size can also be DR_GROUP_SIZE in vectorizable_load orvectorizable_store).

'nvectors' should be also correct here, the question is how we
compute that right now.

nvectors is vector_unroll_factor (i.e. SLP_TREE_LANES / simdlen for BBSLP) in vectorizable_simd_clone_call, vect_get_num_copies (i.e.SLP_TREE_LANES / TYPE_VECTOR_SUBPARTS for BB SLP) invectorizable_operation, and vect_get_num_copies / DR_GROUP_SIZE invectorizable_{store|load}. The vect_get_mask function is also calledwith values of nvectors between 0 and vect_get_num_copies.

I assumed that arguments that are valid for vect_get_loop_mask wouldalso be valid for my new function, vect_slp_get_bb_mask, because myintention was always to share as much code as possible between loopvectorisation and SLP. It's likely that some of the code is not optimalas a result.


--
Christopher Bazley
Staff Software Engineer, GNU Tools Team.
Arm Ltd, 110 Fulbourn Road, Cambridge, CB1 9NJ, UK.
http://www.arm.com/

Re: [RFC 3/9] Implement recording/getting of mask/length for BB SLP

Reply via email to