Hi, This is one fix following Richi's comments here: https://gcc.gnu.org/pipermail/gcc-patches/2020-March/542232.html
I noticed the current half vector support for no peeling gaps handled some cases which never check the half size vector support. By further investigation, those cases are safe to play without peeling gaps due to ideal alignment. It means they don't require half vector handlings, we should avoid to use half vector for them. The fix is to add alignment check as a part of conditions for half vector support avoiding redundant half vector codes. Bootstrapped/regtested on powerpc64le-linux-gnu P8, while aarch64-linux-gnu testing is ongoing. Is it ok for trunk if all testings are fine? BR, Kewen ---------------- gcc/ChangeLog 2020-MM-DD Kewen Lin <li...@gcc.gnu.org> * gcc/tree-vect-stmts.c (vectorizable_load): Check alignment to avoid redundant half vector handlings for no peeling gaps.
diff --git a/gcc/tree-vect-stmts.c b/gcc/tree-vect-stmts.c index 7f3a9fb5fb3..bfd2fabaa81 100644 --- a/gcc/tree-vect-stmts.c +++ b/gcc/tree-vect-stmts.c @@ -9582,6 +9582,12 @@ vectorizable_load (stmt_vec_info stmt_info, gimple_stmt_iterator *gsi, { tree ltype = vectype; tree new_vtype = NULL_TREE; + unsigned HOST_WIDE_INT gap + = DR_GROUP_GAP (first_stmt_info); + unsigned int vect_align + = vect_known_alignment_in_bytes (first_dr_info); + unsigned int scalar_dr_size + = vect_get_scalar_dr_size (first_dr_info); /* If there's no peeling for gaps but we have a gap with slp loads then load the lower half of the vector only. See get_group_load_store_type for @@ -9589,11 +9595,10 @@ vectorizable_load (stmt_vec_info stmt_info, gimple_stmt_iterator *gsi, if (slp && loop_vinfo && !LOOP_VINFO_PEELING_FOR_GAPS (loop_vinfo) - && DR_GROUP_GAP (first_stmt_info) != 0 - && known_eq (nunits, - (group_size - - DR_GROUP_GAP (first_stmt_info)) * 2) - && known_eq (nunits, group_size)) + && gap != 0 + && known_eq (nunits, (group_size - gap) * 2) + && known_eq (nunits, group_size) + && gap >= (vect_align / scalar_dr_size)) { tree half_vtype; new_vtype @@ -9608,10 +9613,9 @@ vectorizable_load (stmt_vec_info stmt_info, gimple_stmt_iterator *gsi, if (ltype != vectype && memory_access_type == VMAT_CONTIGUOUS_REVERSE) { - unsigned HOST_WIDE_INT gap - = DR_GROUP_GAP (first_stmt_info); - gap *= tree_to_uhwi (TYPE_SIZE_UNIT (elem_type)); - tree gapcst = build_int_cst (ref_type, gap); + unsigned HOST_WIDE_INT gap_offset + = gap * tree_to_uhwi (TYPE_SIZE_UNIT (elem_type)); + tree gapcst = build_int_cst (ref_type, gap_offset); offset = size_binop (PLUS_EXPR, offset, gapcst); } data_ref