When diverting to VMAT_GATHER_SCATTER we fail to zero *poffset which was previously set if a load was classified as VMAT_CONTIGUOUS_REVERSE. The following refactors get_group_load_store_type a bit to avoid this but this all needs some serious TLC.
Bootstrapped on x86_64-unknown-linux-gnu, testing in progress. Richard. PR tree-optimization/117709 * tree-vect-stmts.cc (get_group_load_store_type): Only set *poffset when we end up with VMAT_CONTIGUOUS_DOWN or VMAT_CONTIGUOUS_REVERSE. --- gcc/tree-vect-stmts.cc | 8 +++++++- 1 file changed, 7 insertions(+), 1 deletion(-) diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc index 752ee457f6d..522e9f7f90f 100644 --- a/gcc/tree-vect-stmts.cc +++ b/gcc/tree-vect-stmts.cc @@ -2048,6 +2048,7 @@ get_group_load_store_type (vec_info *vinfo, stmt_vec_info stmt_info, unsigned int group_size; unsigned HOST_WIDE_INT gap; bool single_element_p; + poly_int64 neg_ldst_offset = 0; if (STMT_VINFO_GROUPED_ACCESS (stmt_info)) { first_stmt_info = DR_GROUP_FIRST_ELEMENT (stmt_info); @@ -2105,7 +2106,8 @@ get_group_load_store_type (vec_info *vinfo, stmt_vec_info stmt_info, /* ??? The VMAT_CONTIGUOUS_REVERSE code generation is only correct for single element "interleaving" SLP. */ *memory_access_type = get_negative_load_store_type - (vinfo, stmt_info, vectype, vls_type, 1, poffset); + (vinfo, stmt_info, vectype, vls_type, 1, + &neg_ldst_offset); else { /* Try to use consecutive accesses of DR_GROUP_SIZE elements, @@ -2375,6 +2377,10 @@ get_group_load_store_type (vec_info *vinfo, stmt_vec_info stmt_info, masked_p, gs_info, elsvals)) *memory_access_type = VMAT_GATHER_SCATTER; + if (*memory_access_type == VMAT_CONTIGUOUS_DOWN + || *memory_access_type == VMAT_CONTIGUOUS_REVERSE) + *poffset = neg_ldst_offset; + if (*memory_access_type == VMAT_GATHER_SCATTER || *memory_access_type == VMAT_ELEMENTWISE || *memory_access_type == VMAT_STRIDED_SLP -- 2.43.0