On Fri, Jul 12, 2024 at 7:20 AM Feng Xue OS <f...@os.amperecomputing.com> wrote: > > Hi, Richard, > > Let me explain some idea that has to be chosen for lane-reducing. The key > complication is that these ops are associated with two kinds of vec_nums, > one is number of effective vector stmts, which is used by partial vectorzation > function such as vect_get_loop_mask. The other is number of total created > vector stmts. Now we should make it aligned with normal op, in order to > interoperate with normal op. Suppose expressions mixed with lane-reducing > and normal as: > > temp = lane_reducing<16*char> + expr<4*int>; > temp = cst<4*int> * lane_reducing<16*char>; > > If only generating effective vector stmt for lane_reducing, vector def/use > between ops will never be matched, so extra pass-through copies are > necessary. This is why we say "refit a lane-reducing to be a fake normal op".
And this only happens in vect_transform_reduction, right? The other pre-existing issue is that for single_defuse_cycle optimization SLP_TREE_NUMBER_OF_VEC_STMTS is also off (too large). But here the transform also goes through vect_transform_reduction. > The requirement of two vec_stmts are independent of how we will implement > SLP_TREE_NUMBER_OF_VEC_STMTS. Moreover, if we want to refactor vect code > to unify ncopies/vec_num computation and completely remove > SLP_TREE_NUMBER_OF_VEC_STMTS, this tends to be a a large task, and might > be overkill for these lane-reducing patches. So I will keep it as before, and > do > not touch it as what I have done in this patch. > > Since one SLP_TREE_NUMBER_OF_VEC_STMTS could not be used for two purposes. > The your previous suggestion might not be work: > > > As said, I don't like this much. vect_slp_analyze_node_operations_1 sets > > this > > and I think the existing "exception" > > > > /* Calculate the number of vector statements to be created for the > > scalar stmts in this node. For SLP reductions it is equal to the > > number of vector statements in the children (which has already been > > calculated by the recursive call). Otherwise it is the number of > > scalar elements in one scalar iteration (DR_GROUP_SIZE) multiplied by > > VF divided by the number of elements in a vector. */ > > if (SLP_TREE_CODE (node) != VEC_PERM_EXPR > > && !STMT_VINFO_DATA_REF (stmt_info) > > && REDUC_GROUP_FIRST_ELEMENT (stmt_info)) > > { > > for (unsigned i = 0; i < SLP_TREE_CHILDREN (node).length (); ++i) > > if (SLP_TREE_DEF_TYPE (SLP_TREE_CHILDREN (node)[i]) == > > vect_internal_def) > > { > > SLP_TREE_NUMBER_OF_VEC_STMTS (node) > > = SLP_TREE_NUMBER_OF_VEC_STMTS (SLP_TREE_CHILDREN (node)[i]); > > break; > > } > > } > > > > could be changed (or amended if replacing doesn't work out) to > > > > if (SLP_TREE_CODE (node) != VEC_PERM_EXPR > > && STMT_VINFO_REDUC_IDX (stmt_info) > > // do we have this always set? > > && STMT_VINFO_REDUC_VECTYPE_IN (stmt_info)) > > { > > do the same as in else {} but using VECTYPE_IN > > } > > > > Or maybe scrap the special case and use STMT_VINFO_REDUC_VECTYPE_IN > > when that's set instead of SLP_TREE_VECTYPE? As said having wrong > > SLP_TREE_NUMBER_OF_VEC_STMTS is going to backfire. > > Then the alternative is to limit special handling related to the vec_num only > inside vect_transform_reduction. Is that ok? Or any other suggestion? I think that's kind-of in line with the suggestion of a reduction specific VF, so yes, not using SLP_TREE_NUMBER_OF_VEC_STMTS in vect_transform_reduction sounds fine to me and would be a step towards not having SLP_TREE_NUMBER_OF_VEC_STMTS where the function would be responsible for appropriate allocation as well. Richard. > Thanks, > Feng > ________________________________________ > From: Richard Biener <richard.guent...@gmail.com> > Sent: Thursday, July 11, 2024 5:43 PM > To: Feng Xue OS; Richard Sandiford > Cc: gcc-patches@gcc.gnu.org > Subject: Re: [PATCH 2/4] vect: Fix inaccurate vector stmts number for slp > reduction with lane-reducing > > On Thu, Jul 11, 2024 at 10:53 AM Feng Xue OS > <f...@os.amperecomputing.com> wrote: > > > > Vector stmts number of an operation is calculated based on output vectype. > > This is over-estimated for lane-reducing operation. Sometimes, to workaround > > the issue, we have to rely on additional logic to deduce an exactly accurate > > number by other means. Aiming at the inconvenience, in this patch, we would > > "turn" lane-reducing operation into a normal one by inserting new trivial > > statements like zero-valued PHIs and pass-through copies, which could be > > optimized away by later passes. At the same time, a new field is added for > > slp node to hold number of vector stmts that are really effective after > > vectorization. For example: > > Adding Richard into the loop. > > I'm sorry, but this feels a bit backwards - in the end I was hoping that we > can get rid of SLP_TREE_NUMBER_OF_VEC_STMTS completely. > We do currently have the odd ncopies (non-SLP) vs. vec_num (SLP) > duality but in reality all vectorizable_* should know the number of > stmt copies (or output vector defs) to produce by looking at the vector > type and the vectorization factor (and in the SLP case the number of > lanes represented by the node). > > That means that in the end vectorizable_* could at transform time > simply make sure that SLP_TREE_VEC_DEF is appropriately > created (currently generic code does this based on > SLP_TREE_NUMBER_OF_VEC_STMTS and also generic code > tries to determine SLP_TREE_NUMBER_OF_VEC_STMTS). > > There are a lot (well, not too many) uses of SLP_TREE_NUMBER_OF_VEC_STMTS > that short-cut "appropriate" vec_num computation based on > VF, vector type and lanes. I hope all of them could vanish. > > You add vect_get_[slp_]num_vectors, but I'd rather see a single > > inline unsigned int > vect_get_num_copies (vec_info *vinfo, tree vectype, slp_tree node = nullptr) > { > if (loop_vec_info loop_vinfo = dyn_cast <loop_vec_info> (vinfo)) > { > if (node) > return vect_get_num_vectors (LOOP_VINFO_VECT_FACTOR > (loop_vinfo) * SLP_TREE_LANES (node), vectype); > else > return vect_get_num_vectors (LOOP_VINFO_VECT_FACTOR > (loop_vinfo), vectype); > } > else > return vect_get_num_vectors (SLP_TREE_LANES (node), vectype); > } > > so that > > if (slp_node) > { > ncopies = 1; > vec_num = SLP_TREE_NUMBER_OF_VEC_STMTS (slp_node); > } > else > { > ncopies = vect_get_num_copies (loop_vinfo, vectype); > vec_num = 1; > } > > can become > > if (slp_node) > { > ncopies = 1; > vec_num = vect_get_num_copies (loop_vinfo, vectype, slp_node); > } > else > { > ncopies = vect_get_num_copies (loop_vinfo, vectype/*, slp_node */); > vec_num = 1; > } > > without actually resolving the ncopies/vec_num duality (that will solve itself > when we have achieved only-SLP). > > I think if SLP_TREE_NUMBER_OF_VEC_STMTS is gone then having the > few places you needed to fix compute the correct number of stmts should be OK. > Note this will probably require moving the allocation of SLP_TREE_VEC_DEFS > to the code doing the transform (and remove/adjust some sanity checking we > do). > > Thanks, > Richard. > > > int sum = 1; > > for (i) > > { > > sum += d0[i] * d1[i]; // dot-prod <vector(16) char> > > } > > > > The vector size is 128-bit,vectorization factor is 16. Reduction > > statements would be transformed as: > > > > vector<4> int sum_v0 = { 0, 0, 0, 1 }; > > vector<4> int sum_v1 = { 0, 0, 0, 0 }; > > vector<4> int sum_v2 = { 0, 0, 0, 0 }; > > vector<4> int sum_v3 = { 0, 0, 0, 0 }; > > > > for (i / 16) > > { > > sum_v0 = DOT_PROD (d0_v0[i: 0 ~ 15], d1_v0[i: 0 ~ 15], sum_v0); > > sum_v1 = sum_v1; // copy > > sum_v2 = sum_v2; // copy > > sum_v3 = sum_v3; // copy > > } > > > > sum_v = sum_v0 + sum_v1 + sum_v2 + sum_v3; // = sum_v0 > > > > Thanks, > > Feng > > > > --- > > gcc/ > > * tree-vectorizer.h (vec_stmts_effec_size): New field in _slp_tree. > > (SLP_TREE_VEC_STMTS_EFFEC_NUM): New macro. > > (vect_get_num_vectors): New overload function. > > (vect_get_slp_num_vectors): New function. > > * tree-vect-loop.cc (vect_reduction_update_partial_vector_usage): > > Use > > effective vector stmts number. > > (vectorizable_reduction): Compute number of effective vector stmts > > for > > lane-reducing op and reduction PHI. > > (vect_transform_reduction): Insert copies for lane-reducing so as to > > fix inaccurate vector stmts number. > > (vect_transform_cycle_phi): Only need to calculate vector PHI number > > based on input vectype for non-slp code path. > > * tree-vect-slp.cc (_slp_tree::_slp_tree): Initialize effective > > vector > > stmts number to zero. > > (vect_slp_analyze_node_operations_1): Remove adjustment on vector > > stmts number specific to slp reduction. > > (vect_slp_analyze_node_operations): Compute number of vector > > elements > > for constant/external slp node with vect_get_slp_num_vectors. > > --- > > gcc/tree-vect-loop.cc | 139 ++++++++++++++++++++++++++++++++++++------ > > gcc/tree-vect-slp.cc | 56 ++++++----------- > > gcc/tree-vectorizer.h | 45 ++++++++++++++ > > 3 files changed, 183 insertions(+), 57 deletions(-) > > > > diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc > > index c183e2b6068..5ad9836d6c8 100644 > > --- a/gcc/tree-vect-loop.cc > > +++ b/gcc/tree-vect-loop.cc > > @@ -7471,7 +7471,7 @@ vect_reduction_update_partial_vector_usage > > (loop_vec_info loop_vinfo, > > unsigned nvectors; > > > > if (slp_node) > > - nvectors = SLP_TREE_VEC_STMTS_NUM (slp_node); > > + nvectors = SLP_TREE_VEC_STMTS_EFFEC_NUM (slp_node); > > else > > nvectors = vect_get_num_copies (loop_vinfo, vectype_in); > > > > @@ -7594,6 +7594,15 @@ vectorizable_reduction (loop_vec_info loop_vinfo, > > stmt_vec_info phi_info = stmt_info; > > if (!is_a <gphi *> (stmt_info->stmt)) > > { > > + if (lane_reducing_stmt_p (stmt_info->stmt) && slp_node) > > + { > > + /* Compute number of effective vector statements for lane-reducing > > + ops. */ > > + vectype_in = STMT_VINFO_REDUC_VECTYPE_IN (stmt_info); > > + gcc_assert (vectype_in); > > + SLP_TREE_VEC_STMTS_EFFEC_NUM (slp_node) > > + = vect_get_slp_num_vectors (loop_vinfo, slp_node, vectype_in); > > + } > > STMT_VINFO_TYPE (stmt_info) = reduc_vec_info_type; > > return true; > > } > > @@ -8012,14 +8021,25 @@ vectorizable_reduction (loop_vec_info loop_vinfo, > > if (STMT_VINFO_LIVE_P (phi_info)) > > return false; > > > > - if (slp_node) > > - ncopies = 1; > > - else > > - ncopies = vect_get_num_copies (loop_vinfo, vectype_in); > > + poly_uint64 nunits_out = TYPE_VECTOR_SUBPARTS (vectype_out); > > > > - gcc_assert (ncopies >= 1); > > + if (slp_node) > > + { > > + ncopies = 1; > > > > - poly_uint64 nunits_out = TYPE_VECTOR_SUBPARTS (vectype_out); > > + if (maybe_ne (TYPE_VECTOR_SUBPARTS (vectype_in), nunits_out)) > > + { > > + /* Not all vector reduction PHIs would be used, compute number > > + of the effective statements. */ > > + SLP_TREE_VEC_STMTS_EFFEC_NUM (slp_node) > > + = vect_get_slp_num_vectors (loop_vinfo, slp_node, vectype_in); > > + } > > + } > > + else > > + { > > + ncopies = vect_get_num_copies (loop_vinfo, vectype_in); > > + gcc_assert (ncopies >= 1); > > + } > > > > if (nested_cycle) > > { > > @@ -8360,7 +8380,7 @@ vectorizable_reduction (loop_vec_info loop_vinfo, > > || (slp_node > > && !REDUC_GROUP_FIRST_ELEMENT (stmt_info) > > && SLP_TREE_LANES (slp_node) == 1 > > - && vect_get_num_copies (loop_vinfo, vectype_in) > 1)) > > + && SLP_TREE_VEC_STMTS_EFFEC_NUM (slp_node) > 1)) > > && (STMT_VINFO_RELEVANT (stmt_info) <= vect_used_only_live) > > && reduc_chain_length == 1 > > && loop_vinfo->suggested_unroll_factor == 1) > > @@ -8595,12 +8615,15 @@ vect_transform_reduction (loop_vec_info loop_vinfo, > > stmt_vec_info phi_info = STMT_VINFO_REDUC_DEF (vect_orig_stmt > > (stmt_info)); > > gphi *reduc_def_phi = as_a <gphi *> (phi_info->stmt); > > int reduc_index = STMT_VINFO_REDUC_IDX (stmt_info); > > - tree vectype_in = STMT_VINFO_REDUC_VECTYPE_IN (reduc_info); > > + tree vectype_in = STMT_VINFO_REDUC_VECTYPE_IN (stmt_info); > > + > > + if (!vectype_in) > > + vectype_in = STMT_VINFO_VECTYPE (stmt_info); > > > > if (slp_node) > > { > > ncopies = 1; > > - vec_num = SLP_TREE_VEC_STMTS_NUM (slp_node); > > + vec_num = SLP_TREE_VEC_STMTS_EFFEC_NUM (slp_node); > > } > > else > > { > > @@ -8702,6 +8725,64 @@ vect_transform_reduction (loop_vec_info loop_vinfo, > > reduc_index == 2 ? op.ops[2] : NULL_TREE, > > &vec_oprnds[2]); > > } > > + else if (lane_reducing) > > + { > > + /* For normal reduction, consistency between vectorized def/use is > > + naturally ensured when mapping from scalar statement. But if lane- > > + reducing op is involved in reduction, thing would become somewhat > > + complicated in that the op's result and operand for accumulation > > are > > + limited to less lanes than other operands, which certainly causes > > + def/use mismatch on adjacent statements around the op if do not > > have > > + any kind of specific adjustment. One approach is to refit lane- > > + reducing op in the way of introducing new trivial pass-through > > copies > > + to fix possible def/use gap, so as to make it behave like a normal > > op. > > + And vector reduction PHIs are always generated to the full extent, > > no > > + matter lane-reducing op exists or not. If some copies or PHIs are > > + actually superfluous, they would be cleaned up by passes after > > + vectorization. An example for single-lane slp is given as below. > > + Similarly, this handling is applicable for multiple-lane slp as > > well. > > + > > + int sum = 1; > > + for (i) > > + { > > + sum += d0[i] * d1[i]; // dot-prod <vector(16) char> > > + } > > + > > + The vector size is 128-bit,vectorization factor is 16. Reduction > > + statements would be transformed as: > > + > > + vector<4> int sum_v0 = { 0, 0, 0, 1 }; > > + vector<4> int sum_v1 = { 0, 0, 0, 0 }; > > + vector<4> int sum_v2 = { 0, 0, 0, 0 }; > > + vector<4> int sum_v3 = { 0, 0, 0, 0 }; > > + > > + for (i / 16) > > + { > > + sum_v0 = DOT_PROD (d0_v0[i: 0 ~ 15], d1_v0[i: 0 ~ 15], > > sum_v0); > > + sum_v1 = sum_v1; // copy > > + sum_v2 = sum_v2; // copy > > + sum_v3 = sum_v3; // copy > > + } > > + > > + sum_v = sum_v0 + sum_v1 + sum_v2 + sum_v3; // = sum_v0 > > + */ > > + unsigned effec_ncopies = vec_oprnds[0].length (); > > + unsigned total_ncopies = vec_oprnds[reduc_index].length (); > > + > > + if (slp_node) > > + gcc_assert (effec_ncopies == SLP_TREE_VEC_STMTS_EFFEC_NUM > > (slp_node)); > > + > > + gcc_assert (effec_ncopies <= total_ncopies); > > + > > + if (effec_ncopies < total_ncopies) > > + { > > + for (unsigned i = 0; i < op.num_ops - 1; i++) > > + { > > + gcc_assert (vec_oprnds[i].length () == effec_ncopies); > > + vec_oprnds[i].safe_grow_cleared (total_ncopies); > > + } > > + } > > + } > > > > bool emulated_mixed_dot_prod = vect_is_emulated_mixed_dot_prod > > (stmt_info); > > unsigned num = vec_oprnds[reduc_index == 0 ? 1 : 0].length (); > > @@ -8710,7 +8791,27 @@ vect_transform_reduction (loop_vec_info loop_vinfo, > > { > > gimple *new_stmt; > > tree vop[3] = { vec_oprnds[0][i], vec_oprnds[1][i], NULL_TREE }; > > - if (masked_loop_p && !mask_by_cond_expr) > > + if (!vop[0] || !vop[1]) > > + { > > + tree reduc_vop = vec_oprnds[reduc_index][i]; > > + > > + /* If could not generate an effective vector statement for current > > + portion of reduction operand, insert a trivial copy to simply > > + handle over the operand to other dependent statements. */ > > + gcc_assert (reduc_vop); > > + > > + if (slp_node && TREE_CODE (reduc_vop) == SSA_NAME > > + && !SSA_NAME_IS_DEFAULT_DEF (reduc_vop)) > > + new_stmt = SSA_NAME_DEF_STMT (reduc_vop); > > + else > > + { > > + new_temp = make_ssa_name (vec_dest); > > + new_stmt = gimple_build_assign (new_temp, reduc_vop); > > + vect_finish_stmt_generation (loop_vinfo, stmt_info, new_stmt, > > + gsi); > > + } > > + } > > + else if (masked_loop_p && !mask_by_cond_expr) > > { > > /* No conditional ifns have been defined for lane-reducing op > > yet. */ > > @@ -8810,21 +8911,19 @@ vect_transform_cycle_phi (loop_vec_info loop_vinfo, > > /* Leave the scalar phi in place. */ > > return true; > > > > - tree vectype_in = STMT_VINFO_REDUC_VECTYPE_IN (reduc_info); > > - /* For a nested cycle we do not fill the above. */ > > - if (!vectype_in) > > - vectype_in = STMT_VINFO_VECTYPE (stmt_info); > > - gcc_assert (vectype_in); > > - > > if (slp_node) > > { > > - /* The size vect_schedule_slp_instance computes is off for us. */ > > - vec_num = vect_get_num_vectors (LOOP_VINFO_VECT_FACTOR (loop_vinfo) > > - * SLP_TREE_LANES (slp_node), > > vectype_in); > > + vec_num = SLP_TREE_VEC_STMTS_NUM (slp_node); > > ncopies = 1; > > } > > else > > { > > + tree vectype_in = STMT_VINFO_REDUC_VECTYPE_IN (reduc_info); > > + /* For a nested cycle we do not fill the above. */ > > + if (!vectype_in) > > + vectype_in = STMT_VINFO_VECTYPE (stmt_info); > > + gcc_assert (vectype_in); > > + > > vec_num = 1; > > ncopies = vect_get_num_copies (loop_vinfo, vectype_in); > > } > > diff --git a/gcc/tree-vect-slp.cc b/gcc/tree-vect-slp.cc > > index 00863398fd0..3c163c16fc8 100644 > > --- a/gcc/tree-vect-slp.cc > > +++ b/gcc/tree-vect-slp.cc > > @@ -114,6 +114,7 @@ _slp_tree::_slp_tree () > > SLP_TREE_SCALAR_OPS (this) = vNULL; > > SLP_TREE_VEC_DEFS (this) = vNULL; > > SLP_TREE_VEC_STMTS_NUM (this) = 0; > > + SLP_TREE_VEC_STMTS_EFFEC_NUM (this) = 0; > > SLP_TREE_CHILDREN (this) = vNULL; > > SLP_TREE_LOAD_PERMUTATION (this) = vNULL; > > SLP_TREE_LANE_PERMUTATION (this) = vNULL; > > @@ -6552,38 +6553,16 @@ vect_slp_analyze_node_operations_1 (vec_info > > *vinfo, slp_tree node, > > slp_instance node_instance, > > stmt_vector_for_cost *cost_vec) > > { > > - stmt_vec_info stmt_info = SLP_TREE_REPRESENTATIVE (node); > > - > > /* Calculate the number of vector statements to be created for the > > - scalar stmts in this node. For SLP reductions it is equal to the > > - number of vector statements in the children (which has already been > > - calculated by the recursive call). Otherwise it is the number of > > - scalar elements in one scalar iteration (DR_GROUP_SIZE) multiplied by > > - VF divided by the number of elements in a vector. */ > > - if (SLP_TREE_CODE (node) != VEC_PERM_EXPR > > - && !STMT_VINFO_DATA_REF (stmt_info) > > - && REDUC_GROUP_FIRST_ELEMENT (stmt_info)) > > - { > > - for (unsigned i = 0; i < SLP_TREE_CHILDREN (node).length (); ++i) > > - if (SLP_TREE_DEF_TYPE (SLP_TREE_CHILDREN (node)[i]) == > > vect_internal_def) > > - { > > - SLP_TREE_VEC_STMTS_NUM (node) > > - = SLP_TREE_VEC_STMTS_NUM (SLP_TREE_CHILDREN (node)[i]); > > - break; > > - } > > - } > > - else > > - { > > - poly_uint64 vf; > > - if (loop_vec_info loop_vinfo = dyn_cast <loop_vec_info> (vinfo)) > > - vf = loop_vinfo->vectorization_factor; > > - else > > - vf = 1; > > - unsigned int group_size = SLP_TREE_LANES (node); > > - tree vectype = SLP_TREE_VECTYPE (node); > > - SLP_TREE_VEC_STMTS_NUM (node) > > - = vect_get_num_vectors (vf * group_size, vectype); > > - } > > + scalar stmts in this node. */ > > + unsigned int num = vect_get_slp_num_vectors (vinfo, node); > > + > > + SLP_TREE_VEC_STMTS_NUM (node) = num; > > + > > + /* Initialize with the number of vector statements, and would be changed > > + to some smaller one for reduction PHI and lane-reducing statement > > + after real input vectype is determined by vectorizable analysis. */ > > + SLP_TREE_VEC_STMTS_EFFEC_NUM (node) = num; > > > > /* Handle purely internal nodes. */ > > if (SLP_TREE_CODE (node) == VEC_PERM_EXPR) > > @@ -6605,7 +6584,9 @@ vect_slp_analyze_node_operations_1 (vec_info *vinfo, > > slp_tree node, > > return true; > > } > > > > + stmt_vec_info stmt_info = SLP_TREE_REPRESENTATIVE (node); > > bool dummy; > > + > > return vect_analyze_stmt (vinfo, stmt_info, &dummy, > > node, node_instance, cost_vec); > > } > > @@ -6851,12 +6832,13 @@ vect_slp_analyze_node_operations (vec_info *vinfo, > > slp_tree node, > > && j == 1); > > continue; > > } > > - unsigned group_size = SLP_TREE_LANES (child); > > - poly_uint64 vf = 1; > > - if (loop_vec_info loop_vinfo = dyn_cast <loop_vec_info> (vinfo)) > > - vf = loop_vinfo->vectorization_factor; > > - SLP_TREE_VEC_STMTS_NUM (child) > > - = vect_get_num_vectors (vf * group_size, vector_type); > > + > > + /* Calculate number of vector elements for constant slp node. */ > > + unsigned int num = vect_get_slp_num_vectors (vinfo, child); > > + > > + SLP_TREE_VEC_STMTS_NUM (child) = num; > > + SLP_TREE_VEC_STMTS_EFFEC_NUM (child) = num; > > + > > /* And cost them. */ > > vect_prologue_cost_for_slp (child, cost_vec); > > } > > diff --git a/gcc/tree-vectorizer.h b/gcc/tree-vectorizer.h > > index ef89ff1b3e4..f769c2b8c92 100644 > > --- a/gcc/tree-vectorizer.h > > +++ b/gcc/tree-vectorizer.h > > @@ -211,6 +211,15 @@ struct _slp_tree { > > divided by vector size. */ > > unsigned int vec_stmts_size; > > > > + /* Number of vector stmts that are really effective after transformation. > > + Lane-reducing operation requires less vector stmts than normal, so > > + trivial copies will be inserted to make vector def-use chain align > > + mutually. Vector reduction PHIs are fully generated with VF as above, > > + some may be optimized away if it starts a copy-self def-use cycle, > > which > > + occurs in reduction involving only lane-reducing. Note: The > > enforcement > > + of single-defuse-cycle would not impact this, is specially handled. > > */ > > + unsigned int vec_stmts_effec_size; > > + > > /* Reference count in the SLP graph. */ > > unsigned int refcnt; > > /* The maximum number of vector elements for the subtree rooted > > @@ -304,6 +313,7 @@ public: > > #define SLP_TREE_REF_COUNT(S) (S)->refcnt > > #define SLP_TREE_VEC_DEFS(S) (S)->vec_defs > > #define SLP_TREE_VEC_STMTS_NUM(S) (S)->vec_stmts_size > > +#define SLP_TREE_VEC_STMTS_EFFEC_NUM(S) (S)->vec_stmts_effec_size > > #define SLP_TREE_LOAD_PERMUTATION(S) (S)->load_permutation > > #define SLP_TREE_LANE_PERMUTATION(S) (S)->lane_permutation > > #define SLP_TREE_SIMD_CLONE_INFO(S) (S)->simd_clone_info > > @@ -2080,6 +2090,41 @@ vect_get_num_vectors (poly_uint64 nunits, tree > > vectype) > > return exact_div (nunits, TYPE_VECTOR_SUBPARTS (vectype)).to_constant (); > > } > > > > +/* Return the number of vectors in the context of vectorization region > > VINFO, > > + needed for a group of total SIZE statements that are supposed to be > > + interleaved together with no gap, and all operate on vectors of type > > + VECTYPE. */ > > + > > +inline unsigned int > > +vect_get_num_vectors (vec_info *vinfo, tree vectype, unsigned int size) > > +{ > > + poly_uint64 vf; > > + > > + if (loop_vec_info loop_vinfo = dyn_cast <loop_vec_info> (vinfo)) > > + vf = LOOP_VINFO_VECT_FACTOR (loop_vinfo); > > + else > > + vf = 1; > > + > > + return vect_get_num_vectors (vf * size, vectype); > > +} > > + > > +/* Return the number of vectors needed for SLP_NODE regarding to VINFO. If > > + not NULL, VECTYPE gives a new vectype to operate on, rather than > > + SLP_TREE_VECTYPE of the node. */ > > + > > +inline unsigned int > > +vect_get_slp_num_vectors (vec_info *vinfo, slp_tree slp_node, > > + tree vectype = NULL_TREE) > > +{ > > + if (!vectype) > > + { > > + vectype = SLP_TREE_VECTYPE (slp_node); > > + gcc_checking_assert (vectype); > > + } > > + > > + return vect_get_num_vectors (vinfo, vectype, SLP_TREE_LANES (slp_node)); > > +} > > + > > /* Return the number of copies needed for loop vectorization when > > a statement operates on vectors of type VECTYPE. This is the > > vectorization factor divided by the number of elements in > > -- > > 2.17.1