Hi, This patch is to lower vec_promote_demote vectorization cost in rs6000_builtin_vectorization_cost. It's similar to what we committed for vec_perm, the current cost for vec_promote_demote is also overpriced for Power8 and Power9 since Power8 and Power9 has supported more units for permute/unpack/pack rather than single one on Power7.
The performance evaluation on SPEC2017 Power9 shows +2.88% gain on 525.x264_r, degraded -1.70% on 526.blender_r but which has been identified as just exposing some other issues and actually unrelated, while SPEC2017 Power8 evaluation shows +4.63% gain on 525.x264_r without any significant degradations, SPEC2006 Power8 evaluation shows 1.99% gain on 453.povray. The geomean gain for SPEC2017 on both Power8 and Power9 is +0.06%, and it's unchanged for SPEC2006 Power8. Bootstrapped and regress tested on powerpc64le-linux-gnu. Is OK for trunk? Thanks, Kewen gcc/ChangeLog 2019-10-09 Kewen Lin <li...@gcc.gnu.org> * config/rs6000/rs6000.c (rs6000_builtin_vectorization_cost): Lower vec_promote_demote cost to 1 for non-Power7 VSX architectures. ---- diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c index 2fd9808..8040577 100644 --- a/gcc/config/rs6000/rs6000.c +++ b/gcc/config/rs6000/rs6000.c @@ -4781,10 +4781,11 @@ rs6000_builtin_vectorization_cost (enum vect_cost_for_stmt type_of_cost, return 1; case vec_promote_demote: - if (TARGET_VSX) - return 4; - else - return 1; + /* Power7 has only one permute/pack unit, make it a bit expensive. */ + if (TARGET_VSX && rs6000_tune == PROCESSOR_POWER7) + return 4; + else + return 1; case cond_branch_taken: return 3;