On 28/10/2025 13:29, Richard Biener wrote:
On Tue, 28 Oct 2025, Christopher Bazley wrote:

+tree
+vect_slp_get_bb_mask (slp_tree slp_node, gimple_stmt_iterator *gsi,
+                     unsigned int nvectors, tree vectype, unsigned int index)
+{
+  gcc_checking_assert (SLP_TREE_CAN_USE_MASK_P (slp_node));
+
+  /* Only the last vector can be a partial vector.  */
+  if (index < nvectors - 1)
+    return NULL_TREE;
+
+  /* vect_get_num_copies only allows a partial vector if it is the only
+     vector.  */
+  if (nvectors > 1)
+    return NULL_TREE;
+
+  gcc_checking_assert (nvectors == 1);
+
+  poly_uint64 nunits = TYPE_VECTOR_SUBPARTS (vectype);
+  unsigned int group_size = SLP_TREE_LANES (slp_node);
In particular I think that in general with partial vectors the group_size
is not equal to the number of scalar lanes but instead is computed by
the "VF", thus equal to max_nunits?  This means we have to be careful
what we deal with or rather what we want to record, given locally for
a SLP node we can only compute it's own total nunits based on the
number of scalar lanes and the vector type.

group_size comes from scalar_stmts.length () or ops.length () in the overloaded function vect_create_new_slp_node or from TYPE_VECTOR_SUBPARTS (SLP_TREE_VECTYPE (vnode)) in vect_build_slp_tree_2.

LOOP_VINFO_VECT_FACTOR ("VF") is only stored for the loop vectoriser instances, therefore it cannot affect BB SLP. The vect_slp_get_bb_mask  function is only used for BB SLP.

The main use of nunits.max is in calculate_unrolling_factor, which does not require it to be equal to group_size or an integral multiple of group_size, nor vice-versa. The design of that function implies that nunits.max is expected to be divisible by group_size though, so I guess it does something like VF = group_size * nunits.max.

So we might want to explicitly record the group size.  In any case

Sorry but I'm not sure what you are suggesting here. The group size is already explicitly recorded in the SLP node (as 'lanes', although the term 'group_size' seems to be overloaded in the vectoriser --  e.g., group_size can also be DR_GROUP_SIZE in vectorizable_load or vectorizable_store).

'nvectors' should be also correct here, the question is how we
compute that right now.

nvectors is vector_unroll_factor (i.e. SLP_TREE_LANES / simdlen for BB SLP) in vectorizable_simd_clone_call, vect_get_num_copies (i.e. SLP_TREE_LANES / TYPE_VECTOR_SUBPARTS for BB SLP) in vectorizable_operation, and vect_get_num_copies / DR_GROUP_SIZE in vectorizable_{store|load}. The vect_get_mask function is also called with values of nvectors between 0 and vect_get_num_copies.

I assumed that arguments that are valid for vect_get_loop_mask would also be valid for my new function, vect_slp_get_bb_mask, because my intention was always to share as much code as possible between loop vectorisation and SLP. It's likely that some of the code is not optimal as a result.

--
Christopher Bazley
Staff Software Engineer, GNU Tools Team.
Arm Ltd, 110 Fulbourn Road, Cambridge, CB1 9NJ, UK.
http://www.arm.com/

Reply via email to