> On Oct 1, 2013, at 7:12 PM, Cong Hou <co...@google.com> wrote: > > When alias exists between data refs in a loop, to vectorize it GCC > does loop versioning and adds runtime alias checks. Basically for each > pair of data refs with possible data dependence, there will be two > comparisons generated to make sure there is no aliasing between them > in each iteration of the vectorized loop. If there are many such data > refs pairs, the number of comparisons can be very large, which is a > big overhead. > > However, in some cases it is possible to reduce the number of those > comparisons. For example, for the following loop, we can detect that > b[0] and b[1] are two consecutive member accesses so that we can > combine the alias check between a[0:100]&b[0] and a[0:100]&b[1] into > checking a[0:100]&b[0:2]: > > void foo(int*a, int* b) > { > for (int i = 0; i < 100; ++i) > a[i] = b[0] + b[1]; > } > > Actually, the requirement of consecutive memory accesses is too > strict. For the following loop, we can still combine the alias checks > between a[0:100]&b[0] and a[0:100]&b[100]: > > void foo(int*a, int* b) > { > for (int i = 0; i < 100; ++i) > a[i] = b[0] + b[100]; > } > > This is because if b[0] is not in a[0:100] and b[100] is not in > a[0:100] then a[0:100] cannot be between b[0] and b[100]. We only need > to check a[0:100] and b[0:101] don't overlap. > > More generally, consider two pairs of data refs (a, b1) and (a, b2). > Suppose addr_b1 and addr_b2 are basic addresses of data ref b1 and b2; > offset_b1 and offset_b2 (offset_b1 < offset_b2) are offsets of b1 and > b2, and segment_length_a, segment_length_b1, and segment_length_b2 are > segment length of a, b1, and b2. Then we can combine the two > comparisons into one if the following condition is satisfied: > > offset_b2- offset_b1 - segment_length_b1 < segment_length_a > > > This patch detects those combination opportunities to reduce the > number of alias checks. It is tested on an x86-64 machine.
I like the idea of this patch but I am not a fan of using stl really. It seems a little too much dependence on c++ features for my liking. Thanks, Andrew > > > thanks, > Cong > > > > Index: gcc/tree-vect-loop-manip.c > =================================================================== > --- gcc/tree-vect-loop-manip.c (revision 202662) > +++ gcc/tree-vect-loop-manip.c (working copy) > @@ -19,6 +19,10 @@ You should have received a copy of the G > along with GCC; see the file COPYING3. If not see > <http://www.gnu.org/licenses/>. */ > > +#include <vector> > +#include <utility> > +#include <algorithm> > + > #include "config.h" > #include "system.h" > #include "coretypes.h" > @@ -2248,6 +2252,74 @@ vect_vfa_segment_size (struct data_refer > return segment_length; > } > > +namespace > +{ > + > +/* struct dr_addr_with_seg_len > + > + A struct storing information of a data reference, including the data > + ref itself, its basic address, the access offset and the segment length > + for aliasing checks. */ > + > +struct dr_addr_with_seg_len > +{ > + dr_addr_with_seg_len (data_reference* d, tree addr, tree off, tree len) > + : dr (d), basic_addr (addr), offset (off), seg_len (len) {} > + > + data_reference* dr; > + tree basic_addr; > + tree offset; > + tree seg_len; > +}; > + > +/* Operator == between two dr_addr_with_seg_len objects. > + > + This equality operator is used to make sure two data refs > + are the same one so that we will consider to combine the > + aliasing checks of those two pairs of data dependent data > + refs. */ > + > +bool operator == (const dr_addr_with_seg_len& d1, > + const dr_addr_with_seg_len& d2) > +{ > + return operand_equal_p (d1.basic_addr, d2.basic_addr, 0) > + && operand_equal_p (d1.offset, d2.offset, 0) > + && operand_equal_p (d1.seg_len, d2.seg_len, 0); > +} > + > +typedef std::pair <dr_addr_with_seg_len, dr_addr_with_seg_len> > + dr_addr_with_seg_len_pair_t; > + > + > +/* Operator < between two dr_addr_with_seg_len_pair_t objects. > + > + This operator is used to sort objects of dr_addr_with_seg_len_pair_t > + so that we can combine aliasing checks during one scan. */ > + > +bool operator < (const dr_addr_with_seg_len_pair_t& p1, > + const dr_addr_with_seg_len_pair_t& p2) > +{ > + const dr_addr_with_seg_len& p11 = p1.first; > + const dr_addr_with_seg_len& p12 = p1.second; > + const dr_addr_with_seg_len& p21 = p2.first; > + const dr_addr_with_seg_len& p22 = p2.second; > + > + if (p11.basic_addr != p21.basic_addr) > + return p11.basic_addr < p21.basic_addr; > + if (p12.basic_addr != p22.basic_addr) > + return p12.basic_addr < p22.basic_addr; > + if (TREE_CODE (p11.offset) != INTEGER_CST > + || TREE_CODE (p21.offset) != INTEGER_CST) > + return p11.offset < p21.offset; > + if (int_cst_value (p11.offset) != int_cst_value (p21.offset)) > + return int_cst_value (p11.offset) < int_cst_value (p21.offset); > + if (TREE_CODE (p12.offset) != INTEGER_CST > + || TREE_CODE (p22.offset) != INTEGER_CST) > + return p12.offset < p22.offset; > + return int_cst_value (p12.offset) < int_cst_value (p22.offset); > +} > + > +} > > /* Function vect_create_cond_for_alias_checks. > > @@ -2292,20 +2364,51 @@ vect_create_cond_for_alias_checks (loop_ > if (may_alias_ddrs.is_empty ()) > return; > > + > + /* Basically, for each pair of dependent data refs store_ptr_0 > + and load_ptr_0, we create an expression: > + > + ((store_ptr_0 + store_segment_length_0) <= load_ptr_0) > + || (load_ptr_0 + load_segment_length_0) <= store_ptr_0)) > + > + for aliasing checks. However, in some cases we can decrease > + the number of checks by combining two checks into one. For > + example, suppose we have another pair of data refs store_ptr_0 > + and load_ptr_1, and if the following condition is satisfied: > + > + load_ptr_0 < load_ptr_1 && > + load_ptr_1 - load_ptr_0 - load_segment_length_0 < store_segment_length_0 > + > + (this condition means, in each iteration of vectorized loop, > + the accessed memory of store_ptr_0 cannot be between the memory > + of load_ptr_0 and load_ptr_1.) > + > + we then can use only the following expression to finish the > + alising checks between store_ptr_0 & load_ptr_0 and > + store_ptr_0 & load_ptr_1: > + > + ((store_ptr_0 + store_segment_length_0) <= load_ptr_0) > + || (load_ptr_1 + load_segment_length_1 <= store_ptr_0)) > + > + Note that we only consider that load_ptr_0 and load_ptr_1 have the > + same basic address. */ > + > + std::vector<dr_addr_with_seg_len_pair_t> ddrs_with_seg_len; > + > + /* First, we collect all data ref pairs for aliasing checks. */ > + > FOR_EACH_VEC_ELT (may_alias_ddrs, i, ddr) > { > struct data_reference *dr_a, *dr_b; > gimple dr_group_first_a, dr_group_first_b; > - tree addr_base_a, addr_base_b; > tree segment_length_a, segment_length_b; > gimple stmt_a, stmt_b; > - tree seg_a_min, seg_a_max, seg_b_min, seg_b_max; > > dr_a = DDR_A (ddr); > stmt_a = DR_STMT (DDR_A (ddr)); > dr_group_first_a = GROUP_FIRST_ELEMENT (vinfo_for_stmt (stmt_a)); > if (dr_group_first_a) > - { > + { > stmt_a = dr_group_first_a; > dr_a = STMT_VINFO_DATA_REF (vinfo_for_stmt (stmt_a)); > } > @@ -2314,20 +2417,11 @@ vect_create_cond_for_alias_checks (loop_ > stmt_b = DR_STMT (DDR_B (ddr)); > dr_group_first_b = GROUP_FIRST_ELEMENT (vinfo_for_stmt (stmt_b)); > if (dr_group_first_b) > - { > + { > stmt_b = dr_group_first_b; > dr_b = STMT_VINFO_DATA_REF (vinfo_for_stmt (stmt_b)); > } > > - addr_base_a > - = fold_build_pointer_plus (DR_BASE_ADDRESS (dr_a), > - size_binop (PLUS_EXPR, DR_OFFSET (dr_a), > - DR_INIT (dr_a))); > - addr_base_b > - = fold_build_pointer_plus (DR_BASE_ADDRESS (dr_b), > - size_binop (PLUS_EXPR, DR_OFFSET (dr_b), > - DR_INIT (dr_b))); > - > if (!operand_equal_p (DR_STEP (dr_a), DR_STEP (dr_b), 0)) > length_factor = scalar_loop_iters; > else > @@ -2335,24 +2429,149 @@ vect_create_cond_for_alias_checks (loop_ > segment_length_a = vect_vfa_segment_size (dr_a, length_factor); > segment_length_b = vect_vfa_segment_size (dr_b, length_factor); > > + dr_addr_with_seg_len_pair_t dr_with_seg_len_pair > + (dr_addr_with_seg_len > + (dr_a, DR_BASE_ADDRESS (dr_a), > + size_binop (PLUS_EXPR, DR_OFFSET (dr_a), DR_INIT (dr_a)), > + segment_length_a), > + dr_addr_with_seg_len > + (dr_b, DR_BASE_ADDRESS (dr_b), > + size_binop (PLUS_EXPR, DR_OFFSET (dr_b), DR_INIT (dr_b)), > + segment_length_b)); > + > + if (dr_with_seg_len_pair.first.basic_addr > > + dr_with_seg_len_pair.second.basic_addr) > + std::swap (dr_with_seg_len_pair.first, dr_with_seg_len_pair.second); > + > + ddrs_with_seg_len.push_back (dr_with_seg_len_pair); > + } > + > + /* Second, we sort the collected data ref pairs so that we can scan > + them once to combine all possible aliasing checks. */ > + > + std::sort (ddrs_with_seg_len.begin(), ddrs_with_seg_len.end()); > + > + /* Remove duplicate data ref pairs. */ > + ddrs_with_seg_len.erase (std::unique (ddrs_with_seg_len.begin(), > + ddrs_with_seg_len.end()), > + ddrs_with_seg_len.end()); > + > + /* We then scan the sorted dr pairs and check if we can combine > + alias checks of two neighbouring dr pairs. */ > + > + for (size_t i = 1; i < ddrs_with_seg_len.size (); ++i) > + { > + dr_addr_with_seg_len& dr_a1 = ddrs_with_seg_len[i-1].first; > + dr_addr_with_seg_len& dr_b1 = ddrs_with_seg_len[i-1].second; > + dr_addr_with_seg_len& dr_a2 = ddrs_with_seg_len[i].first; > + dr_addr_with_seg_len& dr_b2 = ddrs_with_seg_len[i].second; > + > + if (dr_a1 == dr_a2) > + { > + if (dr_b1.basic_addr != dr_b2.basic_addr > + || TREE_CODE (dr_b1.offset) != INTEGER_CST > + || TREE_CODE (dr_b2.offset) != INTEGER_CST) > + continue; > + > + int diff = int_cst_value (dr_b2.offset) - > + int_cst_value (dr_b1.offset); > + > + gcc_assert (diff > 0); > + > + if (diff <= vect_factor > + || (TREE_CODE (dr_b1.seg_len) == INTEGER_CST > + && diff - int_cst_value (dr_b1.seg_len) < vect_factor) > + || (TREE_CODE (dr_b1.seg_len) == INTEGER_CST > + && TREE_CODE (dr_a1.seg_len) == INTEGER_CST > + && diff - int_cst_value (dr_b1.seg_len) < > + int_cst_value (dr_a1.seg_len))) > + { > + if (dump_enabled_p ()) > + { > + dump_printf_loc > + (MSG_NOTE, vect_location, > + "combining two runtime checks for data references "); > + dump_generic_expr (MSG_NOTE, TDF_SLIM, DR_REF (dr_b1.dr)); > + dump_printf (MSG_NOTE, " and "); > + dump_generic_expr (MSG_NOTE, TDF_SLIM, DR_REF (dr_b2.dr)); > + dump_printf (MSG_NOTE, "\n"); > + } > + > + dr_b1.seg_len = size_binop (PLUS_EXPR, > + dr_b2.seg_len, size_int (diff)); > + ddrs_with_seg_len.erase (ddrs_with_seg_len.begin () + i); > + --i; > + } > + } > + else if (dr_b1 == dr_b2) > + { > + if (dr_a1.basic_addr != dr_a2.basic_addr > + || TREE_CODE (dr_a1.offset) != INTEGER_CST > + || TREE_CODE (dr_a2.offset) != INTEGER_CST) > + continue; > + > + int diff = int_cst_value (dr_a2.offset) - > + int_cst_value (dr_a1.offset); > + > + gcc_assert (diff > 0); > + > + if (diff <= vect_factor > + || (TREE_CODE (dr_a1.seg_len) == INTEGER_CST > + && diff - int_cst_value (dr_a1.seg_len) < vect_factor) > + || (TREE_CODE (dr_a1.seg_len) == INTEGER_CST > + && TREE_CODE (dr_b1.seg_len) == INTEGER_CST > + && diff - int_cst_value (dr_a1.seg_len) < > + int_cst_value (dr_b1.seg_len))) > + { > + if (dump_enabled_p ()) > + { > + dump_printf_loc > + (MSG_NOTE, vect_location, > + "combining two runtime checks for data references "); > + dump_generic_expr (MSG_NOTE, TDF_SLIM, DR_REF (dr_a1.dr)); > + dump_printf (MSG_NOTE, " and "); > + dump_generic_expr (MSG_NOTE, TDF_SLIM, DR_REF (dr_a2.dr)); > + dump_printf (MSG_NOTE, "\n"); > + } > + > + dr_a1.seg_len = size_binop (PLUS_EXPR, > + dr_a2.seg_len, size_int (diff)); > + ddrs_with_seg_len.erase (ddrs_with_seg_len.begin () + i); > + --i; > + } > + } > + } > + > + for (size_t i = 0, s = ddrs_with_seg_len.size (); i < s; ++i) > + { > + const dr_addr_with_seg_len& dr_a = ddrs_with_seg_len[i].first; > + const dr_addr_with_seg_len& dr_b = ddrs_with_seg_len[i].second; > + tree segment_length_a = dr_a.seg_len; > + tree segment_length_b = dr_b.seg_len; > + > + tree addr_base_a > + = fold_build_pointer_plus (dr_a.basic_addr, dr_a.offset); > + tree addr_base_b > + = fold_build_pointer_plus (dr_b.basic_addr, dr_b.offset); > + > if (dump_enabled_p ()) > { > dump_printf_loc (MSG_NOTE, vect_location, > - "create runtime check for data references "); > - dump_generic_expr (MSG_NOTE, TDF_SLIM, DR_REF (dr_a)); > + "create runtime check for data references "); > + dump_generic_expr (MSG_NOTE, TDF_SLIM, DR_REF (dr_a.dr)); > dump_printf (MSG_NOTE, " and "); > - dump_generic_expr (MSG_NOTE, TDF_SLIM, DR_REF (dr_b)); > - dump_printf (MSG_NOTE, "\n"); > + dump_generic_expr (MSG_NOTE, TDF_SLIM, DR_REF (dr_b.dr)); > + dump_printf (MSG_NOTE, "\n"); > } > > - seg_a_min = addr_base_a; > - seg_a_max = fold_build_pointer_plus (addr_base_a, segment_length_a); > - if (tree_int_cst_compare (DR_STEP (dr_a), size_zero_node) < 0) > + tree seg_a_min = addr_base_a; > + tree seg_a_max = fold_build_pointer_plus (addr_base_a, > segment_length_a); > + if (tree_int_cst_compare (DR_STEP (dr_a.dr), size_zero_node) < 0) > seg_a_min = seg_a_max, seg_a_max = addr_base_a; > > - seg_b_min = addr_base_b; > - seg_b_max = fold_build_pointer_plus (addr_base_b, segment_length_b); > - if (tree_int_cst_compare (DR_STEP (dr_b), size_zero_node) < 0) > + tree seg_b_min = addr_base_b; > + tree seg_b_max = fold_build_pointer_plus (addr_base_b, > segment_length_b); > + if (tree_int_cst_compare (DR_STEP (dr_b.dr), size_zero_node) < 0) > seg_b_min = seg_b_max, seg_b_max = addr_base_b; > > part_cond_expr = > @@ -2477,6 +2696,81 @@ vect_loop_versioning (loop_vec_info loop > adjust_phi_and_debug_stmts (orig_phi, e, PHI_RESULT (new_phi)); > } > > + /* Extract load and store statements on pointers with zero-stride > + accesses. */ > + if (LOOP_REQUIRES_VERSIONING_FOR_ALIAS (loop_vinfo)) > + { > + > + /* In the loop body, we iterate each statement to check if it is a load > + or store. Then we check the DR_STEP of the data reference. If > + DR_STEP is zero, then we will hoist the load statement to the loop > + preheader, and move the store statement to the loop exit. */ > + > + for (gimple_stmt_iterator si = gsi_start_bb (loop->header); > + !gsi_end_p (si); ) > + { > + gimple stmt = gsi_stmt (si); > + stmt_vec_info stmt_info = vinfo_for_stmt (stmt); > + struct data_reference *dr = STMT_VINFO_DATA_REF (stmt_info); > + > + > + if (dr && integer_zerop (DR_STEP (dr))) > + { > + if (DR_IS_READ (dr)) > + { > + if (dump_file) > + { > + fprintf (dump_file, > + "Hoist the load to outside of the loop:\n"); > + print_gimple_stmt (dump_file, stmt, 0, > + TDF_VOPS|TDF_MEMSYMS); > + } > + > + basic_block preheader = loop_preheader_edge (loop)->src; > + gimple_stmt_iterator si_dst = gsi_last_bb (preheader); > + gsi_move_after (&si, &si_dst); > + } > + else > + { > + gimple_stmt_iterator si_dst = > + gsi_last_bb (single_exit (loop)->dest); > + gsi_move_after (&si, &si_dst); > + } > + continue; > + } > + else if (!dr) > + { > + bool hoist = true; > + for (size_t i = 0; i < gimple_num_ops (stmt); i++) > + { > + tree op = gimple_op (stmt, i); > + if (TREE_CODE (op) == INTEGER_CST > + || TREE_CODE (op) == REAL_CST) > + continue; > + if (TREE_CODE (op) == SSA_NAME) > + { > + gimple def = SSA_NAME_DEF_STMT (op); > + if (def == stmt > + || gimple_nop_p (def) > + || !flow_bb_inside_loop_p (loop, gimple_bb (def))) > + continue; > + } > + hoist = false; > + break; > + } > + > + if (hoist) > + { > + basic_block preheader = loop_preheader_edge (loop)->src; > + gimple_stmt_iterator si_dst = gsi_last_bb (preheader); > + gsi_move_after (&si, &si_dst); > + continue; > + } > + } > + gsi_next (&si); > + } > + } > + > /* End loop-exit-fixes after versioning. */ > > if (cond_expr_stmt_list) > Index: gcc/ChangeLog > =================================================================== > --- gcc/ChangeLog (revision 202663) > +++ gcc/ChangeLog (working copy) > @@ -1,3 +1,8 @@ > +2013-10-01 Cong Hou <co...@google.com> > + > + * tree-vect-loop-manip.c (vect_create_cond_for_alias_checks): Combine > + alias checks if it is possible to amortize the runtime overhead. > +