Hi Segher, on 2019/9/29 下午3:28, Segher Boessenkool wrote: > Hi! > > On Sun, Sep 29, 2019 at 01:38:31PM +0800, Kewen.Lin wrote: >> Recently we are revisiting vectorization cost setting in >> rs6000_builtin_vectorization_cost, and found the current cost of >> vec_perm on VSX looks overpriced for Power8 and Power9. > > Yeah it does. > >> The high >> cost was set for Power7 single VSU pipe, but Power8 and Power9 >> have supported more VSX units, > > Power7 has two VSU pipes, but only one permute (and only one of the other > VMX units, and only one store). It can do two VSX or two classic FP, and > various combinations. >
Yes, only one unit for vector permute. Thanks for the correction! >> the performance evaluation on >> SPEC2017 Power9 shows 2%+ gain on 538.imagick_r, while SPEC2006/ >> SPEC2017 Power8 evaluations show ~2% gains on 453.povray/ >> 511.povray_r, all don't have any remarkable degradations. > > Nice :-) What does it do to geomean, is that in the right direction > as well? > The ratio geomean speedup data is: SPEC2006 P8 +0.08%, SPEC2017 P8 +0.16%, SPEC2017 P9 +0.23%. > As a follow-up, vec_promote_demote is costed similarly higher for VSX > than for other things, it might help to change that one as well? > You are right, I found it's helpful for 525.x264_r, but it caused one suite degradation on P9, still investigating it. >> This patch is to lower vec_perm vectorization cost for all >> non Power7 VSX architecture (currently Power8 and Power9). > > Okay for trunk with the comment fixed (just say p7 has only one permute > unit?), and assuming geomean is fine too. Thanks! I'll update it by saying "Power7 has only one permute unit". Kewen