This fixes http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50969 by slightly raising the cost of vector permutes on powerpc64 VSX targets (and ensuring those costs are correctly used). This reverses the performance loss for 168.wupwise, and gives a slight boost to 433.milc as well.
In the long run, we will want to model VSX permutes differently, since the real issue is that only one floating-point pipe can hold a permute at a time. Thus the present patch can be overly conservative when permutes are rare compared with other vector instructions. Bootstrapped and regtested on powerpc64-linux-gnu with no failures. OK for trunk? Thanks, Bill 2012-02-03 Bill Schmidt <wschm...@linux.vnet.ibm.com> PR tree-optimization/50969 * tree-vect-stmts.c (vect_model_store_cost): Correct statement cost to use vec_perm rather than vector_stmt. (vect_model_load_cost): Likewise. * config/rs6000/rs6000.c (rs6000_builtin_vectorization_cost): Revise cost of vec_perm for TARGET_VSX. Index: gcc/tree-vect-stmts.c =================================================================== --- gcc/tree-vect-stmts.c (revision 183871) +++ gcc/tree-vect-stmts.c (working copy) @@ -882,7 +882,7 @@ vect_model_store_cost (stmt_vec_info stmt_info, in { /* Uses a high and low interleave operation for each needed permute. */ inside_cost = ncopies * exact_log2(group_size) * group_size - * vect_get_stmt_cost (vector_stmt); + * vect_get_stmt_cost (vec_perm); if (vect_print_dump_info (REPORT_COST)) fprintf (vect_dump, "vect_model_store_cost: strided group_size = %d .", @@ -988,7 +988,7 @@ vect_model_load_cost (stmt_vec_info stmt_info, int { /* Uses an even and odd extract operations for each needed permute. */ inside_cost = ncopies * exact_log2(group_size) * group_size - * vect_get_stmt_cost (vector_stmt); + * vect_get_stmt_cost (vec_perm); if (vect_print_dump_info (REPORT_COST)) fprintf (vect_dump, "vect_model_load_cost: strided group_size = %d .", Index: gcc/config/rs6000/rs6000.c =================================================================== --- gcc/config/rs6000/rs6000.c (revision 183871) +++ gcc/config/rs6000/rs6000.c (working copy) @@ -3540,9 +3540,13 @@ rs6000_builtin_vectorization_cost (enum vect_cost_ case vec_to_scalar: case scalar_to_vec: case cond_branch_not_taken: - case vec_perm: return 1; + case vec_perm: + if (!TARGET_VSX) + return 1; + return 2; + case cond_branch_taken: return 3;