length for BB SLP

Richard Biener Fri, 07 Nov 2025 05:44:22 -0800

On Wed, 5 Nov 2025, Christopher Bazley wrote:

> 
> On 28/10/2025 13:29, Richard Biener wrote:
> > On Tue, 28 Oct 2025, Christopher Bazley wrote:
> >
> >> +tree
> >> +vect_slp_get_bb_mask (slp_tree slp_node, gimple_stmt_iterator *gsi,
> >> +                unsigned int nvectors, tree vectype, unsigned int index)
> >> +{
> >> +  gcc_checking_assert (SLP_TREE_CAN_USE_MASK_P (slp_node));
> >> +
> >> +  /* Only the last vector can be a partial vector.  */
> >> +  if (index < nvectors - 1)
> >> +    return NULL_TREE;
> >> +
> >> +  /* vect_get_num_copies only allows a partial vector if it is the only
> >> +     vector.  */
> >> +  if (nvectors > 1)
> >> +    return NULL_TREE;
> >> +
> >> +  gcc_checking_assert (nvectors == 1);
> >> +
> >> +  poly_uint64 nunits = TYPE_VECTOR_SUBPARTS (vectype);
> >> +  unsigned int group_size = SLP_TREE_LANES (slp_node);
> > In particular I think that in general with partial vectors the group_size
> > is not equal to the number of scalar lanes but instead is computed by
> > the "VF", thus equal to max_nunits?  This means we have to be careful
> > what we deal with or rather what we want to record, given locally for
> > a SLP node we can only compute it's own total nunits based on the
> > number of scalar lanes and the vector type.
> 
> group_size comes from scalar_stmts.length () or ops.length () in the
> overloaded function vect_create_new_slp_node or from TYPE_VECTOR_SUBPARTS
> (SLP_TREE_VECTYPE (vnode)) in vect_build_slp_tree_2.
> 
> LOOP_VINFO_VECT_FACTOR ("VF") is only stored for the loop vectoriser
> instances, therefore it cannot affect BB SLP. The vect_slp_get_bb_mask 
> function is only used for BB SLP.
> 
> The main use of nunits.max is in calculate_unrolling_factor, which does not
> require it to be equal to group_size or an integral multiple of group_size,
> nor vice-versa. The design of that function implies that nunits.max is
> expected to be divisible by group_size though, so I guess it does something
> like VF = group_size * nunits.max.
> 
> > So we might want to explicitly record the group size.  In any case
> 
> Sorry but I'm not sure what you are suggesting here. The group size is already
> explicitly recorded in the SLP node (as 'lanes', although the term
> 'group_size' seems to be overloaded in the vectoriser --  e.g., group_size can
> also be DR_GROUP_SIZE in vectorizable_load or vectorizable_store).
> 
> > 'nvectors' should be also correct here, the question is how we
> > compute that right now.
> 
> nvectors is vector_unroll_factor (i.e. SLP_TREE_LANES / simdlen for BB SLP) in
> vectorizable_simd_clone_call, vect_get_num_copies (i.e. SLP_TREE_LANES /
> TYPE_VECTOR_SUBPARTS for BB SLP) in vectorizable_operation, and
> vect_get_num_copies / DR_GROUP_SIZE in vectorizable_{store|load}. The
> vect_get_mask function is also called with values of nvectors between 0
> and vect_get_num_copies.
> 
> I assumed that arguments that are valid for vect_get_loop_mask would also be
> valid for my new function, vect_slp_get_bb_mask, because my intention was
> always to share as much code as possible between loop vectorisation and SLP.
> It's likely that some of the code is not optimal as a result.


Well yes, I know all this.  But when we now add padding, the question
is where we should track that (or, as you seem to imply, not track it).
Do we increase group_size to reflect that the actual vectors have more
lanes?  Do we just track that in max_nunits somehow?

The purpose of max_nunits for BB vectorization is solely to detect
the case that we do not have sufficient lanes in the SLP node to
fill the vector lanes of the vector type we chose, thus we'd need
"unrolling".

Richard.

-- 
Richard Biener <[email protected]>
SUSE Software Solutions Germany GmbH,
Frankenstrasse 146, 90461 Nuernberg, Germany;
GF: Jochen Jaser, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)

Re: [RFC 3/9] Implement recording/getting of mask/length for BB SLP

Reply via email to