Hi,

This patch is to lower vec_promote_demote vectorization cost in
rs6000_builtin_vectorization_cost.  It's similar to what we committed
for vec_perm, the current cost for vec_promote_demote is also
overpriced for Power8 and Power9 since Power8 and Power9 has
supported more units for permute/unpack/pack rather than single one
on Power7.

The performance evaluation on SPEC2017 Power9 shows +2.88% gain on
525.x264_r, degraded -1.70% on 526.blender_r but which has been
identified as just exposing some other issues and actually unrelated,
while SPEC2017 Power8 evaluation shows +4.63% gain on 525.x264_r 
without any significant degradations, SPEC2006 Power8 evaluation 
shows 1.99% gain on 453.povray.  The geomean gain for SPEC2017
on both Power8 and Power9 is +0.06%, and it's unchanged for SPEC2006
Power8.

Bootstrapped and regress tested on powerpc64le-linux-gnu.  
Is OK for trunk?


Thanks,
Kewen


gcc/ChangeLog

2019-10-09  Kewen Lin  <li...@gcc.gnu.org>

        * config/rs6000/rs6000.c (rs6000_builtin_vectorization_cost): Lower
        vec_promote_demote cost to 1 for non-Power7 VSX architectures.

----

diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c
index 2fd9808..8040577 100644
--- a/gcc/config/rs6000/rs6000.c
+++ b/gcc/config/rs6000/rs6000.c
@@ -4781,10 +4781,11 @@ rs6000_builtin_vectorization_cost (enum 
vect_cost_for_stmt type_of_cost,
          return 1;

       case vec_promote_demote:
-        if (TARGET_VSX)
-          return 4;
-        else
-          return 1;
+       /* Power7 has only one permute/pack unit, make it a bit expensive.  */
+       if (TARGET_VSX && rs6000_tune == PROCESSOR_POWER7)
+         return 4;
+       else
+         return 1;

       case cond_branch_taken:
         return 3;

Reply via email to