When alias exists between data refs in a loop, to vectorize it GCC does loop versioning and adds runtime alias checks. Basically for each pair of data refs with possible data dependence, there will be two comparisons generated to make sure there is no aliasing between them in each iteration of the vectorized loop. If there are many such data refs pairs, the number of comparisons can be very large, which is a big overhead.
However, in some cases it is possible to reduce the number of those comparisons. For example, for the following loop, we can detect that b[0] and b[1] are two consecutive member accesses so that we can combine the alias check between a[0:100]&b[0] and a[0:100]&b[1] into checking a[0:100]&b[0:2]: void foo(int*a, int* b) { for (int i = 0; i < 100; ++i) a[i] = b[0] + b[1]; } Actually, the requirement of consecutive memory accesses is too strict. For the following loop, we can still combine the alias checks between a[0:100]&b[0] and a[0:100]&b[100]: void foo(int*a, int* b) { for (int i = 0; i < 100; ++i) a[i] = b[0] + b[100]; } This is because if b[0] is not in a[0:100] and b[100] is not in a[0:100] then a[0:100] cannot be between b[0] and b[100]. We only need to check a[0:100] and b[0:101] don't overlap. More generally, consider two pairs of data refs (a, b1) and (a, b2). Suppose addr_b1 and addr_b2 are basic addresses of data ref b1 and b2; offset_b1 and offset_b2 (offset_b1 < offset_b2) are offsets of b1 and b2, and segment_length_a, segment_length_b1, and segment_length_b2 are segment length of a, b1, and b2. Then we can combine the two comparisons into one if the following condition is satisfied: offset_b2- offset_b1 - segment_length_b1 < segment_length_a This patch detects those combination opportunities to reduce the number of alias checks. It is tested on an x86-64 machine. thanks, Cong Index: gcc/tree-vect-loop-manip.c =================================================================== --- gcc/tree-vect-loop-manip.c (revision 202662) +++ gcc/tree-vect-loop-manip.c (working copy) @@ -19,6 +19,10 @@ You should have received a copy of the G along with GCC; see the file COPYING3. If not see <http://www.gnu.org/licenses/>. */ +#include <vector> +#include <utility> +#include <algorithm> + #include "config.h" #include "system.h" #include "coretypes.h" @@ -2248,6 +2252,74 @@ vect_vfa_segment_size (struct data_refer return segment_length; } +namespace +{ + +/* struct dr_addr_with_seg_len + + A struct storing information of a data reference, including the data + ref itself, its basic address, the access offset and the segment length + for aliasing checks. */ + +struct dr_addr_with_seg_len +{ + dr_addr_with_seg_len (data_reference* d, tree addr, tree off, tree len) + : dr (d), basic_addr (addr), offset (off), seg_len (len) {} + + data_reference* dr; + tree basic_addr; + tree offset; + tree seg_len; +}; + +/* Operator == between two dr_addr_with_seg_len objects. + + This equality operator is used to make sure two data refs + are the same one so that we will consider to combine the + aliasing checks of those two pairs of data dependent data + refs. */ + +bool operator == (const dr_addr_with_seg_len& d1, + const dr_addr_with_seg_len& d2) +{ + return operand_equal_p (d1.basic_addr, d2.basic_addr, 0) + && operand_equal_p (d1.offset, d2.offset, 0) + && operand_equal_p (d1.seg_len, d2.seg_len, 0); +} + +typedef std::pair <dr_addr_with_seg_len, dr_addr_with_seg_len> + dr_addr_with_seg_len_pair_t; + + +/* Operator < between two dr_addr_with_seg_len_pair_t objects. + + This operator is used to sort objects of dr_addr_with_seg_len_pair_t + so that we can combine aliasing checks during one scan. */ + +bool operator < (const dr_addr_with_seg_len_pair_t& p1, + const dr_addr_with_seg_len_pair_t& p2) +{ + const dr_addr_with_seg_len& p11 = p1.first; + const dr_addr_with_seg_len& p12 = p1.second; + const dr_addr_with_seg_len& p21 = p2.first; + const dr_addr_with_seg_len& p22 = p2.second; + + if (p11.basic_addr != p21.basic_addr) + return p11.basic_addr < p21.basic_addr; + if (p12.basic_addr != p22.basic_addr) + return p12.basic_addr < p22.basic_addr; + if (TREE_CODE (p11.offset) != INTEGER_CST + || TREE_CODE (p21.offset) != INTEGER_CST) + return p11.offset < p21.offset; + if (int_cst_value (p11.offset) != int_cst_value (p21.offset)) + return int_cst_value (p11.offset) < int_cst_value (p21.offset); + if (TREE_CODE (p12.offset) != INTEGER_CST + || TREE_CODE (p22.offset) != INTEGER_CST) + return p12.offset < p22.offset; + return int_cst_value (p12.offset) < int_cst_value (p22.offset); +} + +} /* Function vect_create_cond_for_alias_checks. @@ -2292,20 +2364,51 @@ vect_create_cond_for_alias_checks (loop_ if (may_alias_ddrs.is_empty ()) return; + + /* Basically, for each pair of dependent data refs store_ptr_0 + and load_ptr_0, we create an expression: + + ((store_ptr_0 + store_segment_length_0) <= load_ptr_0) + || (load_ptr_0 + load_segment_length_0) <= store_ptr_0)) + + for aliasing checks. However, in some cases we can decrease + the number of checks by combining two checks into one. For + example, suppose we have another pair of data refs store_ptr_0 + and load_ptr_1, and if the following condition is satisfied: + + load_ptr_0 < load_ptr_1 && + load_ptr_1 - load_ptr_0 - load_segment_length_0 < store_segment_length_0 + + (this condition means, in each iteration of vectorized loop, + the accessed memory of store_ptr_0 cannot be between the memory + of load_ptr_0 and load_ptr_1.) + + we then can use only the following expression to finish the + alising checks between store_ptr_0 & load_ptr_0 and + store_ptr_0 & load_ptr_1: + + ((store_ptr_0 + store_segment_length_0) <= load_ptr_0) + || (load_ptr_1 + load_segment_length_1 <= store_ptr_0)) + + Note that we only consider that load_ptr_0 and load_ptr_1 have the + same basic address. */ + + std::vector<dr_addr_with_seg_len_pair_t> ddrs_with_seg_len; + + /* First, we collect all data ref pairs for aliasing checks. */ + FOR_EACH_VEC_ELT (may_alias_ddrs, i, ddr) { struct data_reference *dr_a, *dr_b; gimple dr_group_first_a, dr_group_first_b; - tree addr_base_a, addr_base_b; tree segment_length_a, segment_length_b; gimple stmt_a, stmt_b; - tree seg_a_min, seg_a_max, seg_b_min, seg_b_max; dr_a = DDR_A (ddr); stmt_a = DR_STMT (DDR_A (ddr)); dr_group_first_a = GROUP_FIRST_ELEMENT (vinfo_for_stmt (stmt_a)); if (dr_group_first_a) - { + { stmt_a = dr_group_first_a; dr_a = STMT_VINFO_DATA_REF (vinfo_for_stmt (stmt_a)); } @@ -2314,20 +2417,11 @@ vect_create_cond_for_alias_checks (loop_ stmt_b = DR_STMT (DDR_B (ddr)); dr_group_first_b = GROUP_FIRST_ELEMENT (vinfo_for_stmt (stmt_b)); if (dr_group_first_b) - { + { stmt_b = dr_group_first_b; dr_b = STMT_VINFO_DATA_REF (vinfo_for_stmt (stmt_b)); } - addr_base_a - = fold_build_pointer_plus (DR_BASE_ADDRESS (dr_a), - size_binop (PLUS_EXPR, DR_OFFSET (dr_a), - DR_INIT (dr_a))); - addr_base_b - = fold_build_pointer_plus (DR_BASE_ADDRESS (dr_b), - size_binop (PLUS_EXPR, DR_OFFSET (dr_b), - DR_INIT (dr_b))); - if (!operand_equal_p (DR_STEP (dr_a), DR_STEP (dr_b), 0)) length_factor = scalar_loop_iters; else @@ -2335,24 +2429,149 @@ vect_create_cond_for_alias_checks (loop_ segment_length_a = vect_vfa_segment_size (dr_a, length_factor); segment_length_b = vect_vfa_segment_size (dr_b, length_factor); + dr_addr_with_seg_len_pair_t dr_with_seg_len_pair + (dr_addr_with_seg_len + (dr_a, DR_BASE_ADDRESS (dr_a), + size_binop (PLUS_EXPR, DR_OFFSET (dr_a), DR_INIT (dr_a)), + segment_length_a), + dr_addr_with_seg_len + (dr_b, DR_BASE_ADDRESS (dr_b), + size_binop (PLUS_EXPR, DR_OFFSET (dr_b), DR_INIT (dr_b)), + segment_length_b)); + + if (dr_with_seg_len_pair.first.basic_addr > + dr_with_seg_len_pair.second.basic_addr) + std::swap (dr_with_seg_len_pair.first, dr_with_seg_len_pair.second); + + ddrs_with_seg_len.push_back (dr_with_seg_len_pair); + } + + /* Second, we sort the collected data ref pairs so that we can scan + them once to combine all possible aliasing checks. */ + + std::sort (ddrs_with_seg_len.begin(), ddrs_with_seg_len.end()); + + /* Remove duplicate data ref pairs. */ + ddrs_with_seg_len.erase (std::unique (ddrs_with_seg_len.begin(), + ddrs_with_seg_len.end()), + ddrs_with_seg_len.end()); + + /* We then scan the sorted dr pairs and check if we can combine + alias checks of two neighbouring dr pairs. */ + + for (size_t i = 1; i < ddrs_with_seg_len.size (); ++i) + { + dr_addr_with_seg_len& dr_a1 = ddrs_with_seg_len[i-1].first; + dr_addr_with_seg_len& dr_b1 = ddrs_with_seg_len[i-1].second; + dr_addr_with_seg_len& dr_a2 = ddrs_with_seg_len[i].first; + dr_addr_with_seg_len& dr_b2 = ddrs_with_seg_len[i].second; + + if (dr_a1 == dr_a2) + { + if (dr_b1.basic_addr != dr_b2.basic_addr + || TREE_CODE (dr_b1.offset) != INTEGER_CST + || TREE_CODE (dr_b2.offset) != INTEGER_CST) + continue; + + int diff = int_cst_value (dr_b2.offset) - + int_cst_value (dr_b1.offset); + + gcc_assert (diff > 0); + + if (diff <= vect_factor + || (TREE_CODE (dr_b1.seg_len) == INTEGER_CST + && diff - int_cst_value (dr_b1.seg_len) < vect_factor) + || (TREE_CODE (dr_b1.seg_len) == INTEGER_CST + && TREE_CODE (dr_a1.seg_len) == INTEGER_CST + && diff - int_cst_value (dr_b1.seg_len) < + int_cst_value (dr_a1.seg_len))) + { + if (dump_enabled_p ()) + { + dump_printf_loc + (MSG_NOTE, vect_location, + "combining two runtime checks for data references "); + dump_generic_expr (MSG_NOTE, TDF_SLIM, DR_REF (dr_b1.dr)); + dump_printf (MSG_NOTE, " and "); + dump_generic_expr (MSG_NOTE, TDF_SLIM, DR_REF (dr_b2.dr)); + dump_printf (MSG_NOTE, "\n"); + } + + dr_b1.seg_len = size_binop (PLUS_EXPR, + dr_b2.seg_len, size_int (diff)); + ddrs_with_seg_len.erase (ddrs_with_seg_len.begin () + i); + --i; + } + } + else if (dr_b1 == dr_b2) + { + if (dr_a1.basic_addr != dr_a2.basic_addr + || TREE_CODE (dr_a1.offset) != INTEGER_CST + || TREE_CODE (dr_a2.offset) != INTEGER_CST) + continue; + + int diff = int_cst_value (dr_a2.offset) - + int_cst_value (dr_a1.offset); + + gcc_assert (diff > 0); + + if (diff <= vect_factor + || (TREE_CODE (dr_a1.seg_len) == INTEGER_CST + && diff - int_cst_value (dr_a1.seg_len) < vect_factor) + || (TREE_CODE (dr_a1.seg_len) == INTEGER_CST + && TREE_CODE (dr_b1.seg_len) == INTEGER_CST + && diff - int_cst_value (dr_a1.seg_len) < + int_cst_value (dr_b1.seg_len))) + { + if (dump_enabled_p ()) + { + dump_printf_loc + (MSG_NOTE, vect_location, + "combining two runtime checks for data references "); + dump_generic_expr (MSG_NOTE, TDF_SLIM, DR_REF (dr_a1.dr)); + dump_printf (MSG_NOTE, " and "); + dump_generic_expr (MSG_NOTE, TDF_SLIM, DR_REF (dr_a2.dr)); + dump_printf (MSG_NOTE, "\n"); + } + + dr_a1.seg_len = size_binop (PLUS_EXPR, + dr_a2.seg_len, size_int (diff)); + ddrs_with_seg_len.erase (ddrs_with_seg_len.begin () + i); + --i; + } + } + } + + for (size_t i = 0, s = ddrs_with_seg_len.size (); i < s; ++i) + { + const dr_addr_with_seg_len& dr_a = ddrs_with_seg_len[i].first; + const dr_addr_with_seg_len& dr_b = ddrs_with_seg_len[i].second; + tree segment_length_a = dr_a.seg_len; + tree segment_length_b = dr_b.seg_len; + + tree addr_base_a + = fold_build_pointer_plus (dr_a.basic_addr, dr_a.offset); + tree addr_base_b + = fold_build_pointer_plus (dr_b.basic_addr, dr_b.offset); + if (dump_enabled_p ()) { dump_printf_loc (MSG_NOTE, vect_location, - "create runtime check for data references "); - dump_generic_expr (MSG_NOTE, TDF_SLIM, DR_REF (dr_a)); + "create runtime check for data references "); + dump_generic_expr (MSG_NOTE, TDF_SLIM, DR_REF (dr_a.dr)); dump_printf (MSG_NOTE, " and "); - dump_generic_expr (MSG_NOTE, TDF_SLIM, DR_REF (dr_b)); - dump_printf (MSG_NOTE, "\n"); + dump_generic_expr (MSG_NOTE, TDF_SLIM, DR_REF (dr_b.dr)); + dump_printf (MSG_NOTE, "\n"); } - seg_a_min = addr_base_a; - seg_a_max = fold_build_pointer_plus (addr_base_a, segment_length_a); - if (tree_int_cst_compare (DR_STEP (dr_a), size_zero_node) < 0) + tree seg_a_min = addr_base_a; + tree seg_a_max = fold_build_pointer_plus (addr_base_a, segment_length_a); + if (tree_int_cst_compare (DR_STEP (dr_a.dr), size_zero_node) < 0) seg_a_min = seg_a_max, seg_a_max = addr_base_a; - seg_b_min = addr_base_b; - seg_b_max = fold_build_pointer_plus (addr_base_b, segment_length_b); - if (tree_int_cst_compare (DR_STEP (dr_b), size_zero_node) < 0) + tree seg_b_min = addr_base_b; + tree seg_b_max = fold_build_pointer_plus (addr_base_b, segment_length_b); + if (tree_int_cst_compare (DR_STEP (dr_b.dr), size_zero_node) < 0) seg_b_min = seg_b_max, seg_b_max = addr_base_b; part_cond_expr = @@ -2477,6 +2696,81 @@ vect_loop_versioning (loop_vec_info loop adjust_phi_and_debug_stmts (orig_phi, e, PHI_RESULT (new_phi)); } + /* Extract load and store statements on pointers with zero-stride + accesses. */ + if (LOOP_REQUIRES_VERSIONING_FOR_ALIAS (loop_vinfo)) + { + + /* In the loop body, we iterate each statement to check if it is a load + or store. Then we check the DR_STEP of the data reference. If + DR_STEP is zero, then we will hoist the load statement to the loop + preheader, and move the store statement to the loop exit. */ + + for (gimple_stmt_iterator si = gsi_start_bb (loop->header); + !gsi_end_p (si); ) + { + gimple stmt = gsi_stmt (si); + stmt_vec_info stmt_info = vinfo_for_stmt (stmt); + struct data_reference *dr = STMT_VINFO_DATA_REF (stmt_info); + + + if (dr && integer_zerop (DR_STEP (dr))) + { + if (DR_IS_READ (dr)) + { + if (dump_file) + { + fprintf (dump_file, + "Hoist the load to outside of the loop:\n"); + print_gimple_stmt (dump_file, stmt, 0, + TDF_VOPS|TDF_MEMSYMS); + } + + basic_block preheader = loop_preheader_edge (loop)->src; + gimple_stmt_iterator si_dst = gsi_last_bb (preheader); + gsi_move_after (&si, &si_dst); + } + else + { + gimple_stmt_iterator si_dst = + gsi_last_bb (single_exit (loop)->dest); + gsi_move_after (&si, &si_dst); + } + continue; + } + else if (!dr) + { + bool hoist = true; + for (size_t i = 0; i < gimple_num_ops (stmt); i++) + { + tree op = gimple_op (stmt, i); + if (TREE_CODE (op) == INTEGER_CST + || TREE_CODE (op) == REAL_CST) + continue; + if (TREE_CODE (op) == SSA_NAME) + { + gimple def = SSA_NAME_DEF_STMT (op); + if (def == stmt + || gimple_nop_p (def) + || !flow_bb_inside_loop_p (loop, gimple_bb (def))) + continue; + } + hoist = false; + break; + } + + if (hoist) + { + basic_block preheader = loop_preheader_edge (loop)->src; + gimple_stmt_iterator si_dst = gsi_last_bb (preheader); + gsi_move_after (&si, &si_dst); + continue; + } + } + gsi_next (&si); + } + } + /* End loop-exit-fixes after versioning. */ if (cond_expr_stmt_list) Index: gcc/ChangeLog =================================================================== --- gcc/ChangeLog (revision 202663) +++ gcc/ChangeLog (working copy) @@ -1,3 +1,8 @@ +2013-10-01 Cong Hou <co...@google.com> + + * tree-vect-loop-manip.c (vect_create_cond_for_alias_checks): Combine + alias checks if it is possible to amortize the runtime overhead. +