vectorizable_load had a curious "force_peeling" variable, with no comment explaining why we need it for single-element interleaving but not for other cases. I think it's simply because we weren't initialising the GROUP_GAP correctly for single loads.
Tested on aarch64-linux-gnu and x86_64-linux-gnu. OK to install? Thanks, Richard gcc/ * tree-vect-data-refs.c (vect_analyze_group_access_1): Set GROUP_GAP for single-element interleaving. * tree-vect-stmts.c (vectorizable_load): Remove force_peeling variable. diff --git a/gcc/tree-vect-data-refs.c b/gcc/tree-vect-data-refs.c index 7652e21..36d302a 100644 --- a/gcc/tree-vect-data-refs.c +++ b/gcc/tree-vect-data-refs.c @@ -2233,6 +2233,7 @@ vect_analyze_group_access_1 (struct data_reference *dr) { GROUP_FIRST_ELEMENT (vinfo_for_stmt (stmt)) = stmt; GROUP_SIZE (vinfo_for_stmt (stmt)) = groupsize; + GROUP_GAP (stmt_info) = groupsize - 1; if (dump_enabled_p ()) { dump_printf_loc (MSG_NOTE, vect_location, diff --git a/gcc/tree-vect-stmts.c b/gcc/tree-vect-stmts.c index 9ab4af4..585c073 100644 --- a/gcc/tree-vect-stmts.c +++ b/gcc/tree-vect-stmts.c @@ -6319,7 +6319,6 @@ vectorizable_load (gimple *stmt, gimple_stmt_iterator *gsi, gimple **vec_stmt, that leaves unused vector loads around punt - we at least create very sub-optimal code in that case (and blow up memory, see PR65518). */ - bool force_peeling = false; if (first_stmt == stmt && !GROUP_NEXT_ELEMENT (stmt_info)) { @@ -6333,7 +6332,7 @@ vectorizable_load (gimple *stmt, gimple_stmt_iterator *gsi, gimple **vec_stmt, } /* Single-element interleaving requires peeling for gaps. */ - force_peeling = true; + gcc_assert (GROUP_GAP (stmt_info)); } /* If there is a gap in the end of the group or the group size cannot @@ -6341,8 +6340,7 @@ vectorizable_load (gimple *stmt, gimple_stmt_iterator *gsi, gimple **vec_stmt, elements in the last iteration and thus need to peel that off. */ if (loop_vinfo && ! STMT_VINFO_STRIDED_P (stmt_info) - && (force_peeling - || GROUP_GAP (vinfo_for_stmt (first_stmt)) != 0 + && (GROUP_GAP (vinfo_for_stmt (first_stmt)) != 0 || (!slp && vf % GROUP_SIZE (vinfo_for_stmt (first_stmt)) != 0))) { if (dump_enabled_p ())