Hi Segher,

on 2019/9/29 下午3:28, Segher Boessenkool wrote:
> Hi!
> 
> On Sun, Sep 29, 2019 at 01:38:31PM +0800, Kewen.Lin wrote:
>> Recently we are revisiting vectorization cost setting in 
>> rs6000_builtin_vectorization_cost, and found the current cost of
>> vec_perm on VSX looks overpriced for Power8 and Power9.
> 
> Yeah it does.
> 
>> The high
>> cost was set for Power7 single VSU pipe, but Power8 and Power9
>> have supported more VSX units,
> 
> Power7 has two VSU pipes, but only one permute (and only one of the other
> VMX units, and only one store).  It can do two VSX or two classic FP, and
> various combinations.
> 

Yes, only one unit for vector permute.  Thanks for the correction!

>> the performance evaluation on
>> SPEC2017 Power9 shows 2%+ gain on 538.imagick_r, while SPEC2006/
>> SPEC2017 Power8 evaluations show ~2% gains on 453.povray/
>> 511.povray_r, all don't have any remarkable degradations.
> 
> Nice :-)  What does it do to geomean, is that in the right direction
> as well?
> 

The ratio geomean speedup data is:  SPEC2006 P8 +0.08%, SPEC2017 P8 +0.16%, 
SPEC2017 P9 +0.23%.

> As a follow-up, vec_promote_demote is costed similarly higher for VSX
> than for other things, it might help to change that one as well?
> 

You are right, I found it's helpful for 525.x264_r, but it caused one
suite degradation on P9, still investigating it.

>> This patch is to lower vec_perm vectorization cost for all
>> non Power7 VSX architecture (currently Power8 and Power9).
> 
> Okay for trunk with the comment fixed (just say p7 has only one permute
> unit?), and assuming geomean is fine too.

Thanks!  I'll update it by saying "Power7 has only one permute unit".


Kewen

Reply via email to