The following patch restricts the previous fix for PR84037 to the case of strided loads with non-constant step to avoid regression nbench LU decomposition test on Haswell where the change causes us to use AVX128 instead of AVX256 in the two critical loops.
Bootstrapped and tested on x86_64-unknown-linux-gnu. SPEC CPU 2006 results are in the noise, so is SPEC CPU 2000 (200.sixtrack seems to be awfully jumpy for me - it goes up and down by almost 50%!), nbench LU factorization performance is back up. OK for trunk? Thanks, Richard. 2018-04-24 Richard Biener <rguent...@suse.de> PR target/85491 * config/i386/i386.c (ix86_add_stmt_cost): Restrict strided load cost increase to the case of non-constant step. Index: gcc/config/i386/i386.c =================================================================== --- gcc/config/i386/i386.c (revision 259556) +++ gcc/config/i386/i386.c (working copy) @@ -50550,8 +50550,9 @@ ix86_add_stmt_cost (void *data, int coun construction cost by the number of elements involved. */ if (kind == vec_construct && stmt_info - && stmt_info->type == load_vec_info_type - && stmt_info->memory_access_type == VMAT_ELEMENTWISE) + && STMT_VINFO_TYPE (stmt_info) == load_vec_info_type + && STMT_VINFO_MEMORY_ACCESS_TYPE (stmt_info) == VMAT_ELEMENTWISE + && TREE_CODE (DR_STEP (STMT_VINFO_DATA_REF (stmt_info))) != INTEGER_CST) { stmt_cost = ix86_builtin_vectorization_cost (kind, vectype, misalign); stmt_cost *= TYPE_VECTOR_SUBPARTS (vectype);