Hi Richi,

Thanks for the comments!

on 2021/8/16 下午2:49, Richard Biener wrote:
> On Mon, Aug 16, 2021 at 8:03 AM Kewen.Lin <li...@linux.ibm.com> wrote:
>>
>> Hi,
>>
>> IIUC, the function vectorizable_bb_reduc_epilogue missed to
>> consider the cost to extract the final value from the vector
>> for reduc operations.  This patch is to add one time of
>> vec_to_scalar cost for extracting.
>>
>> Bootstrapped & regtested on powerpc64le-linux-gnu P9.
>> The testing on x86_64 and aarch64 is ongoing.
>>
>> Is it ok for trunk?
> 
> There's no such instruction necessary, the way the costing works
> the result is in lane zero already.  Note the optabs are defined
> to reduce to a scalar already.  So if your arch implements those and
> requires such move then the backend costing needs to handle that.
> 

Yes, these reduc_<operation>_scal_<mode> should have made the
operand[0] as the final scalar result.

> That said, ideally we'd simply cost the IFN_REDUC_* in the backend
> but for BB reductions we don't actually build a SLP node with such
> representative stmt to pass down (yet).
> 

OK, thanks for the explanation.  It explains why we cost the 
IFN_REDUC_* as one vect_stmt in loop vect but cost it as
conservative (shuffle and reduc_op) as possible here.

> I guess you're running into a integer reduction where there's
> a vector -> gpr move missing in costing?  I suppose costing
> vec_to_scalar works for that but in the end we should maybe
> find a way to cost the IFN_REDUC_* ...

Yeah, it's a reduction on plus, initially I wanted to adjust backend
costing for various IFN_REDUC* (since for some variants Power has more
than one instructions for them), then I noticed we cost the reduction
as shuffle and reduc_op during SLP for now, I guess it's good to get
vec_to_scalar considered here for consistency?  Then it can be removed
together when we have a better modeling in the end? 

BR,
Kewen

> 
> Richard.
> 
>> BR,
>> Kewen
>> -----
>> gcc/ChangeLog:
>>
>>         * tree-vect-slp.c (vectorizable_bb_reduc_epilogue): Add the cost for
>>         value extraction.
>>
>> diff --git a/gcc/tree-vect-slp.c b/gcc/tree-vect-slp.c
>> index b9d88c2d943..841a0872afa 100644
>> --- a/gcc/tree-vect-slp.c
>> +++ b/gcc/tree-vect-slp.c
>> @@ -4845,12 +4845,14 @@ vectorizable_bb_reduc_epilogue (slp_instance 
>> instance,
>>      return false;
>>
>>    /* There's no way to cost a horizontal vector reduction via REDUC_FN so
>> -     cost log2 vector operations plus shuffles.  */
>> +     cost log2 vector operations plus shuffles and one extraction.  */
>>    unsigned steps = floor_log2 (vect_nunits_for_cost (vectype));
>>    record_stmt_cost (cost_vec, steps, vector_stmt, instance->root_stmts[0],
>>                     vectype, 0, vect_body);
>>    record_stmt_cost (cost_vec, steps, vec_perm, instance->root_stmts[0],
>>                     vectype, 0, vect_body);
>> +  record_stmt_cost (cost_vec, 1, vec_to_scalar, instance->root_stmts[0],
>> +                   vectype, 0, vect_body);
>>    return true;
>>  }


Reply via email to