Hi Richi, Thanks for the comments!
on 2021/8/16 下午2:49, Richard Biener wrote: > On Mon, Aug 16, 2021 at 8:03 AM Kewen.Lin <li...@linux.ibm.com> wrote: >> >> Hi, >> >> IIUC, the function vectorizable_bb_reduc_epilogue missed to >> consider the cost to extract the final value from the vector >> for reduc operations. This patch is to add one time of >> vec_to_scalar cost for extracting. >> >> Bootstrapped & regtested on powerpc64le-linux-gnu P9. >> The testing on x86_64 and aarch64 is ongoing. >> >> Is it ok for trunk? > > There's no such instruction necessary, the way the costing works > the result is in lane zero already. Note the optabs are defined > to reduce to a scalar already. So if your arch implements those and > requires such move then the backend costing needs to handle that. > Yes, these reduc_<operation>_scal_<mode> should have made the operand[0] as the final scalar result. > That said, ideally we'd simply cost the IFN_REDUC_* in the backend > but for BB reductions we don't actually build a SLP node with such > representative stmt to pass down (yet). > OK, thanks for the explanation. It explains why we cost the IFN_REDUC_* as one vect_stmt in loop vect but cost it as conservative (shuffle and reduc_op) as possible here. > I guess you're running into a integer reduction where there's > a vector -> gpr move missing in costing? I suppose costing > vec_to_scalar works for that but in the end we should maybe > find a way to cost the IFN_REDUC_* ... Yeah, it's a reduction on plus, initially I wanted to adjust backend costing for various IFN_REDUC* (since for some variants Power has more than one instructions for them), then I noticed we cost the reduction as shuffle and reduc_op during SLP for now, I guess it's good to get vec_to_scalar considered here for consistency? Then it can be removed together when we have a better modeling in the end? BR, Kewen > > Richard. > >> BR, >> Kewen >> ----- >> gcc/ChangeLog: >> >> * tree-vect-slp.c (vectorizable_bb_reduc_epilogue): Add the cost for >> value extraction. >> >> diff --git a/gcc/tree-vect-slp.c b/gcc/tree-vect-slp.c >> index b9d88c2d943..841a0872afa 100644 >> --- a/gcc/tree-vect-slp.c >> +++ b/gcc/tree-vect-slp.c >> @@ -4845,12 +4845,14 @@ vectorizable_bb_reduc_epilogue (slp_instance >> instance, >> return false; >> >> /* There's no way to cost a horizontal vector reduction via REDUC_FN so >> - cost log2 vector operations plus shuffles. */ >> + cost log2 vector operations plus shuffles and one extraction. */ >> unsigned steps = floor_log2 (vect_nunits_for_cost (vectype)); >> record_stmt_cost (cost_vec, steps, vector_stmt, instance->root_stmts[0], >> vectype, 0, vect_body); >> record_stmt_cost (cost_vec, steps, vec_perm, instance->root_stmts[0], >> vectype, 0, vect_body); >> + record_stmt_cost (cost_vec, 1, vec_to_scalar, instance->root_stmts[0], >> + vectype, 0, vect_body); >> return true; >> }