On Thu, Nov 21, 2013 at 8:26 AM, Wei Mi <w...@google.com> wrote: > Hi, > > This patch works on the intrinsic calls handling issue in IVOPT mentioned > here: > http://gcc.gnu.org/ml/gcc-patches/2010-10/msg01295.html > > In find_interesting_uses_stmt, it changes > > arg = expr > __builtin_xxx (arg) > > to > > arg = expr; > tmp = addr_expr (mem_ref(arg)); > __builtin_xxx (tmp, ...) > > So mem_ref(arg) could be handled by find_interesting_uses_address, and > an iv use of USE_ADDRESS type will be generated for expr, then a TMR > will be generated for expr in rewrite_use_address. Expand pass is > changed accordingly to keep the complex addressing mode not to be > splitted for cse. > > With the patch we can handle the motivational case below. > > #include <x86intrin.h> > extern __m128i arr[], d[]; > void test (void) > { > unsigned int b; > for (b = 0; b < 1000; b += 2) { > __m128i *p = (__m128i *)(&d[b]); > __m128i a = _mm_load_si128(&arr[4*b+3]); > __m128i v = _mm_loadu_si128(p); > v = _mm_xor_si128(v, a); > _mm_storeu_si128(p, v); > } > } > > gcc-r204792 Without the patch: > .L2: > movdqu (%rax), %xmm0 > subq $-128, %rdx > addq $32, %rax > pxor -128(%rdx), %xmm0 > movups %xmm0, -32(%rax) > cmpq $arr+64048, %rdx > jne .L2 > > gcc-r204792 With the patch: > .L2: > movdqu d(%rax), %xmm0 > pxor arr+48(,%rax,4), %xmm0 > addq $32, %rax > movups %xmm0, d-32(%rax) > cmpq $16000, %rax > jne .L2 > > Following things to be addressed later: > 1. TER needs to be extended to handle the case when TMR is csed. > > 2. For more complex cases to work, besides this patch, we also need to > tune the AVG_LOOP_NITER, which is now set to 5, and it limits > induction variables merging a lot. Increasing the parameter to a > larger one could remove some induction variable in critical loop in > some our benchmarks. reg pressure estimation may also need to be > tuned. I will address it in a separate patch. > > 3. Now the information about which param of a builtin is of memory > reference type is simply listed as a switch-case in > builtin_has_mem_ref_p and ix86_builtin_has_mem_ref_p. This is not > ideal but there is no infrastructure to describe it in existing > implementation. More detailed information such as parameter and call > side-effect will be important for more precise alias and may worth > some work. Maybe the refinement about this patch could be done after > that. > > regression and bootstrap pass on x86_64-linux-gnu. ok for trunk?
So what you are doing is basically not only rewriting memory references to possibly use TARGET_MEM_REF but also address uses to use &TARGET_MEM_REF. I think this is a good thing in general (given instructions like x86 lea) and I would not bother distinguishing the different kind of uses. Richard. > Thanks, > Wei. > > 2013-11-20 Wei Mi <w...@google.com> > > * expr.c (expand_expr_addr_expr_1): Not to split TMR. > (expand_expr_real_1): Ditto. > * targhooks.c (default_builtin_has_mem_ref_p): Default > builtin. > * tree-ssa-loop-ivopts.c (struct iv): Add field builtin_mem_param. > (alloc_iv): Ditto. > (remove_unused_ivs): Ditto. > (builtin_has_mem_ref_p): New function. > (find_interesting_uses_stmt): Special handling of builtins. > * gimple-expr.c (is_gimple_address): Add handling of TMR. > * gimple-expr.h (is_gimple_addressable): Ditto. > * config/i386/i386.c (ix86_builtin_has_mem_ref_p): New target hook. > (ix86_atomic_assign_expand_fenv): Ditto. > (ix86_expand_special_args_builtin): Special handling of TMR for > builtin. > * target.def (builtin_has_mem_ref_p): New hook. > * doc/tm.texi.in: Ditto. > * doc/tm.texi: Generated. > > 2013-11-20 Wei Mi <w...@google.com> > > * gcc.dg/tree-ssa/ivopt_5.c: New test. > > Index: expr.c > =================================================================== > --- expr.c (revision 204792) > +++ expr.c (working copy) > @@ -7467,7 +7467,19 @@ expand_expr_addr_expr_1 (tree exp, rtx t > tem = fold_build_pointer_plus (tem, TREE_OPERAND (exp, 1)); > return expand_expr (tem, target, tmode, modifier); > } > + case TARGET_MEM_REF: > + { > + int old_cse_not_expected; > + addr_space_t as > + = TYPE_ADDR_SPACE (TREE_TYPE (TREE_TYPE (TREE_OPERAND (exp, 0)))); > > + result = addr_for_mem_ref (exp, as, true); > + old_cse_not_expected = cse_not_expected; > + cse_not_expected = true; > + result = memory_address_addr_space (tmode, result, as); > + cse_not_expected = old_cse_not_expected; > + return result; > + } > case CONST_DECL: > /* Expand the initializer like constants above. */ > result = XEXP (expand_expr_constant (DECL_INITIAL (exp), > @@ -9526,9 +9538,13 @@ expand_expr_real_1 (tree exp, rtx target > = TYPE_ADDR_SPACE (TREE_TYPE (TREE_TYPE (TREE_OPERAND (exp, 0)))); > enum insn_code icode; > unsigned int align; > + int old_cse_not_expected; > > op0 = addr_for_mem_ref (exp, as, true); > + old_cse_not_expected = cse_not_expected; > + cse_not_expected = true; > op0 = memory_address_addr_space (mode, op0, as); > + cse_not_expected = old_cse_not_expected; > temp = gen_rtx_MEM (mode, op0); > set_mem_attributes (temp, exp, 0); > set_mem_addr_space (temp, as); > Index: targhooks.c > =================================================================== > --- targhooks.c (revision 204792) > +++ targhooks.c (working copy) > @@ -566,6 +566,13 @@ default_builtin_reciprocal (unsigned int > } > > bool > +default_builtin_has_mem_ref_p (int built_in_function ATTRIBUTE_UNUSED, > + int i ATTRIBUTE_UNUSED) > +{ > + return false; > +} > + > +bool > hook_bool_CUMULATIVE_ARGS_mode_tree_bool_false ( > cumulative_args_t ca ATTRIBUTE_UNUSED, > enum machine_mode mode ATTRIBUTE_UNUSED, > Index: tree-ssa-loop-ivopts.c > =================================================================== > --- tree-ssa-loop-ivopts.c (revision 204792) > +++ tree-ssa-loop-ivopts.c (working copy) > @@ -135,6 +135,8 @@ struct iv > tree ssa_name; /* The ssa name with the value. */ > bool biv_p; /* Is it a biv? */ > bool have_use_for; /* Do we already have a use for it? */ > + bool builtin_mem_param; /* Used as param of a builtin, so it could not be > + removed by remove_unused_ivs. */ > unsigned use_id; /* The identifier in the use if it is the case. */ > }; > > @@ -952,6 +954,7 @@ alloc_iv (tree base, tree step) > iv->step = step; > iv->biv_p = false; > iv->have_use_for = false; > + iv->builtin_mem_param = false; > iv->use_id = 0; > iv->ssa_name = NULL_TREE; > > @@ -1874,13 +1877,36 @@ find_invariants_stmt (struct ivopts_data > } > } > > +/* Find whether the Ith param of the BUILTIN is a mem > + reference. If I is -1, it returns whether the BUILTIN > + contains any mem reference type param. */ > + > +static bool > +builtin_has_mem_ref_p (gimple builtin, int i) > +{ > + tree fndecl = gimple_call_fndecl (builtin); > + if (DECL_BUILT_IN_CLASS (fndecl) == BUILT_IN_NORMAL) > + { > + switch (DECL_FUNCTION_CODE (fndecl)) > + { > + case BUILT_IN_PREFETCH: > + if (i == -1 || i == 0) > + return true; > + } > + } > + else if (DECL_BUILT_IN_CLASS (fndecl) == BUILT_IN_MD) > + return targetm.builtin_has_mem_ref_p ((int) DECL_FUNCTION_CODE > (fndecl), i); > + > + return false; > +} > + > /* Finds interesting uses of induction variables in the statement STMT. */ > > static void > find_interesting_uses_stmt (struct ivopts_data *data, gimple stmt) > { > struct iv *iv; > - tree op, *lhs, *rhs; > + tree op, *lhs, *rhs, callee; > ssa_op_iter iter; > use_operand_p use_p; > enum tree_code code; > @@ -1937,6 +1963,74 @@ find_interesting_uses_stmt (struct ivopt > > call (memory). */ > } > + else if (is_gimple_call (stmt) > + && (callee = gimple_call_fndecl (stmt)) > + && is_builtin_fn (callee) > + && builtin_has_mem_ref_p (stmt, -1)) > + { > + size_t i; > + for (i = 0; i < gimple_call_num_args (stmt); i++) > + { > + if (builtin_has_mem_ref_p (stmt, i)) > + { > + gimple def, g; > + gimple_seq seq = NULL; > + tree type, mem, addr, rhs; > + tree *arg = gimple_call_arg_ptr (stmt, i); > + location_t loc = gimple_location (stmt); > + gimple_stmt_iterator gsi = gsi_for_stmt (stmt); > + > + if (TREE_CODE (*arg) != SSA_NAME) > + continue; > + > + def = SSA_NAME_DEF_STMT (*arg); > + gcc_assert (gimple_code (def) == GIMPLE_PHI > + || is_gimple_assign (def)); > + /* Suppose we have the case: > + arg = expr; > + call (arg) > + If the expr is not like the form: &MEM(...), change it to: > + arg = expr; > + tmp = &MEM(arg); > + call(tmp); > + then try to find interesting uses address in MEM(arg). */ > + if (is_gimple_assign (def) > + && (rhs = gimple_assign_rhs1(def)) > + && TREE_CODE (rhs) == ADDR_EXPR > + && REFERENCE_CLASS_P (TREE_OPERAND (rhs, 0))) > + { > + iv = get_iv (data, *arg); > + if (iv && !iv->builtin_mem_param) > + iv->builtin_mem_param = true; > + > + find_interesting_uses_address (data, def, > + &TREE_OPERAND (rhs, 0)); > + } > + else > + { > + mem = build2 (MEM_REF, TREE_TYPE (*arg), *arg, > + build_int_cst (TREE_TYPE (*arg), 0)); > + type = build_pointer_type (TREE_TYPE (*arg)); > + addr = build1 (ADDR_EXPR, type, mem); > + g = gimple_build_assign_with_ops (ADDR_EXPR, > + make_ssa_name (type, > NULL), > + addr, NULL); > + gimple_call_set_arg (stmt, i, gimple_assign_lhs (g)); > + update_stmt (stmt); > + gimple_set_location (g, loc); > + gimple_seq_add_stmt_without_update (&seq, g); > + gsi_insert_seq_before (&gsi, seq, GSI_SAME_STMT); > + find_interesting_uses_address (data, g, > &TREE_OPERAND (addr, 0)); > + } > + } > + else > + { > + tree arg = gimple_call_arg (stmt, i); > + find_interesting_uses_op (data, arg); > + } > + } > + return; > + } > > if (gimple_code (stmt) == GIMPLE_PHI > && gimple_bb (stmt) == data->current_loop->header) > @@ -6507,10 +6601,10 @@ remove_unused_ivs (struct ivopts_data *d > && !integer_zerop (info->iv->step) > && !info->inv_id > && !info->iv->have_use_for > + && !info->iv->builtin_mem_param > && !info->preserve_biv) > { > bitmap_set_bit (toremove, SSA_NAME_VERSION (info->iv->ssa_name)); > - > tree def = info->iv->ssa_name; > > if (MAY_HAVE_DEBUG_STMTS && SSA_NAME_DEF_STMT (def)) > Index: gimple-expr.c > =================================================================== > --- gimple-expr.c (revision 204792) > +++ gimple-expr.c (working copy) > @@ -633,7 +633,8 @@ is_gimple_address (const_tree t) > op = TREE_OPERAND (op, 0); > } > > - if (CONSTANT_CLASS_P (op) || TREE_CODE (op) == MEM_REF) > + if (CONSTANT_CLASS_P (op) || TREE_CODE (op) == MEM_REF > + || TREE_CODE (op) == TARGET_MEM_REF) > return true; > > switch (TREE_CODE (op)) > Index: gimple-expr.h > =================================================================== > --- gimple-expr.h (revision 204792) > +++ gimple-expr.h (working copy) > @@ -122,7 +122,8 @@ static inline bool > is_gimple_addressable (tree t) > { > return (is_gimple_id (t) || handled_component_p (t) > - || TREE_CODE (t) == MEM_REF); > + || TREE_CODE (t) == MEM_REF > + || TREE_CODE (t) == TARGET_MEM_REF); > } > > /* Return true if T is a valid gimple constant. */ > Index: config/i386/i386.c > =================================================================== > --- config/i386/i386.c (revision 204792) > +++ config/i386/i386.c (working copy) > @@ -29639,6 +29639,50 @@ ix86_init_mmx_sse_builtins (void) > } > } > > +/* Return whether the Ith param of the BUILTIN_FUNCTION > + is a memory reference. If I == -1, return whether the > + BUILTIN_FUNCTION contains any memory reference param. */ > + > +static bool > +ix86_builtin_has_mem_ref_p (int builtin_function, int i) > +{ > + switch ((enum ix86_builtins) builtin_function) > + { > + /* LOAD. */ > + case IX86_BUILTIN_LOADHPS: > + case IX86_BUILTIN_LOADLPS: > + case IX86_BUILTIN_LOADHPD: > + case IX86_BUILTIN_LOADLPD: > + if (i == -1 || i == 1) > + return true; > + break; > + case IX86_BUILTIN_LOADUPS: > + case IX86_BUILTIN_LOADUPD: > + case IX86_BUILTIN_LOADDQU: > + case IX86_BUILTIN_LOADUPD256: > + case IX86_BUILTIN_LOADUPS256: > + case IX86_BUILTIN_LOADDQU256: > + case IX86_BUILTIN_LDDQU256: > + if (i == -1 || i == 0) > + return true; > + break; > + /* STORE. */ > + case IX86_BUILTIN_STOREHPS: > + case IX86_BUILTIN_STORELPS: > + case IX86_BUILTIN_STOREUPS: > + case IX86_BUILTIN_STOREUPD: > + case IX86_BUILTIN_STOREDQU: > + case IX86_BUILTIN_STOREUPD256: > + case IX86_BUILTIN_STOREUPS256: > + case IX86_BUILTIN_STOREDQU256: > + if (i == -1 || i == 0) > + return true; > + default: > + break; > + } > + return false; > +} > + > /* This adds a condition to the basic_block NEW_BB in function FUNCTION_DECL > to return a pointer to VERSION_DECL if the outcome of the expression > formed by PREDICATE_CHAIN is true. This function will be called during > @@ -32525,7 +32569,13 @@ ix86_expand_special_args_builtin (const > if (i == memory) > { > /* This must be the memory operand. */ > - op = force_reg (Pmode, convert_to_mode (Pmode, op, 1)); > + > + /* We expect the builtin could be expanded to rtl with memory > + operand and proper addressing mode will be kept as specified > + in TARGET_MEM_REF. */ > + if (!(TREE_CODE (arg) == ADDR_EXPR > + && TREE_CODE (TREE_OPERAND (arg, 0)) == TARGET_MEM_REF)) > + op = force_reg (Pmode, convert_to_mode (Pmode, op, 1)); > op = gen_rtx_MEM (mode, op); > gcc_assert (GET_MODE (op) == mode > || GET_MODE (op) == VOIDmode); > @@ -43737,6 +43787,9 @@ ix86_atomic_assign_expand_fenv (tree *ho > #undef TARGET_BUILTIN_RECIPROCAL > #define TARGET_BUILTIN_RECIPROCAL ix86_builtin_reciprocal > > +#undef TARGET_BUILTIN_HAS_MEM_REF_P > +#define TARGET_BUILTIN_HAS_MEM_REF_P ix86_builtin_has_mem_ref_p > + > #undef TARGET_ASM_FUNCTION_EPILOGUE > #define TARGET_ASM_FUNCTION_EPILOGUE ix86_output_function_epilogue > > Index: doc/tm.texi > =================================================================== > --- doc/tm.texi (revision 204792) > +++ doc/tm.texi (working copy) > @@ -709,6 +709,12 @@ If a target implements string objects th > If a target implements string objects then this hook should should > provide a facility to check the function arguments in @var{args_list} > against the format specifiers in @var{format_arg} where the type of > @var{format_arg} is one recognized as a valid string reference type. > @end deftypefn > > +@deftypefn {Target Hook} bool TARGET_BUILTIN_HAS_MEM_REF_P (int > @var{builtin_function}, int @var{i}) > +This hook return whether the @var{i}th param of the @var{builtin_function} > +is a memory reference. If @var{i} is -1, return whether the > @var{builtin_function} > +contains any memory reference type param. > +@end deftypefn > + > @deftypefn {Target Hook} void TARGET_OVERRIDE_OPTIONS_AFTER_CHANGE (void) > This target function is similar to the hook @code{TARGET_OPTION_OVERRIDE} > but is called when the optimize level is changed via an attribute or > Index: doc/tm.texi.in > =================================================================== > --- doc/tm.texi.in (revision 204792) > +++ doc/tm.texi.in (working copy) > @@ -697,6 +697,8 @@ should use @code{TARGET_HANDLE_C_OPTION} > > @hook TARGET_CHECK_STRING_OBJECT_FORMAT_ARG > > +@hook TARGET_BUILTIN_HAS_MEM_REF_P > + > @hook TARGET_OVERRIDE_OPTIONS_AFTER_CHANGE > > @defmac C_COMMON_OVERRIDE_OPTIONS > Index: target.def > =================================================================== > --- target.def (revision 204792) > +++ target.def (working copy) > @@ -1742,6 +1742,14 @@ HOOK_VECTOR_END (vectorize) > #undef HOOK_PREFIX > #define HOOK_PREFIX "TARGET_" > > +DEFHOOK > +(builtin_has_mem_ref_p, > + "This hook return whether the @var{i}th param of the > @var{builtin_function}\n\ > +is a memory reference. If @var{i} is -1, return whether the > @var{builtin_function}\n\ > +contains any memory reference type param.", > + bool, (int builtin_function, int i), > + default_builtin_has_mem_ref_p) > + > /* Allow target specific overriding of option settings after options have > been changed by an attribute or pragma or when it is reset at the > end of the code affected by an attribute or pragma. */ > Index: testsuite/gcc.dg/tree-ssa/ivopt_5.c > =================================================================== > --- testsuite/gcc.dg/tree-ssa/ivopt_5.c (revision 0) > +++ testsuite/gcc.dg/tree-ssa/ivopt_5.c (revision 0) > @@ -0,0 +1,21 @@ > +/* { dg-do compile { target {{ i?86-*-* x86_64-*-* } && lp64 } } } */ > +/* { dg-options "-O2 -m64 -fdump-tree-ivopts-details" } */ > + > +/* Make sure only one iv is selected after IVOPT. */ > + > +#include <x86intrin.h> > +extern __m128i arr[], d[]; > +void test (void) > +{ > + unsigned int b; > + for (b = 0; b < 1000; b += 2) { > + __m128i *p = (__m128i *)(&d[b]); > + __m128i a = _mm_load_si128(&arr[4*b+3]); > + __m128i v = _mm_loadu_si128(p); > + v = _mm_xor_si128(v, a); > + _mm_storeu_si128(p, v); > + } > +} > + > +/* { dg-final { scan-tree-dump-times "PHI <ivtmp" 1 "ivopts"} } */ > +/* { dg-final { cleanup-tree-dump "ivopts" } } */