On Tue, Oct 4, 2011 at 9:43 AM, Tom de Vries <tom_devr...@mentor.com> wrote: > On 10/01/2011 05:46 PM, Tom de Vries wrote: >> On 09/30/2011 03:29 PM, Richard Guenther wrote: >>> On Thu, Sep 29, 2011 at 3:15 PM, Tom de Vries <tom_devr...@mentor.com> >>> wrote: >>>> On 09/28/2011 11:53 AM, Richard Guenther wrote: >>>>> On Wed, Sep 28, 2011 at 11:34 AM, Tom de Vries <tom_devr...@mentor.com> >>>>> wrote: >>>>>> Richard, >>>>>> >>>>>> I got a patch for PR50527. >>>>>> >>>>>> The patch prevents the alignment of vla-related allocas to be set to >>>>>> BIGGEST_ALIGNMENT in ccp. The alignment may turn out smaller after >>>>>> folding >>>>>> the alloca. >>>>>> >>>>>> Bootstrapped and regtested on x86_64. >>>>>> >>>>>> OK for trunk? >>>>> >>>>> Hmm. As gfortran with -fstack-arrays uses VLAs it's probably bad that >>>>> the vectorizer then will no longer see that the arrays are properly >>>>> aligned. >>>>> >>>>> I'm not sure what the best thing to do is here, other than trying to >>>>> record >>>>> the alignment requirement of the VLA somewhere. >>>>> >>>>> Forcing the alignment of the alloca replacement decl to BIGGEST_ALIGNMENT >>>>> has the issue that it will force stack-realignment which isn't free (and >>>>> the >>>>> point was to make the decl cheaper than the alloca). But that might >>>>> possibly be the better choice. >>>>> >>>>> Any other thoughts? >>>> >>>> How about the approach in this (untested) patch? Using the DECL_ALIGN of >>>> the vla >>>> for the new array prevents stack realignment for folded vla-allocas, also >>>> for >>>> large vlas. >>>> >>>> This will not help in vectorizing large folded vla-allocas, but I think >>>> it's not >>>> reasonable to expect BIGGEST_ALIGNMENT when writing a vla (although that >>>> has >>>> been the case up until we started to fold). If you want to trigger >>>> vectorization >>>> for a vla, you can still use the aligned attribute on the declaration. >>>> >>>> Still, the unfolded vla-allocas will have BIGGEST_ALIGNMENT, also without >>>> using >>>> an attribute on the decl. This patch exploits this by setting it at the >>>> end of >>>> the 3rd pass_ccp, renamed to pass_ccp_last. This is not very effective in >>>> propagation though, because although the ptr_info of the lhs is propagated >>>> via >>>> copy_prop afterwards, it's not propagated anymore via ccp. >>>> >>>> Another way to do this would be to set BIGGEST_ALIGNMENT at the end of >>>> ccp2 and >>>> not fold during ccp3. >>> >>> Ugh, somehow I like this the least ;) >>> >>> How about lowering VLAs to >>> >>> p = __builtin_alloca (...); >>> p = __builtin_assume_aligned (p, DECL_ALIGN (vla)); >>> >>> and not assume anything for alloca itself if it feeds a >>> __builtin_assume_aligned? >>> >>> Or rather introduce a __builtin_alloca_with_align () and for VLAs do >>> >>> p = __builtin_alloca_with_align (..., DECL_ALIGN (vla)); >>> >>> that's less awkward to use? >>> >>> Sorry for not having a clear plan here ;) >>> >> >> Using assume_aligned is a more orthogonal way to represent this in gimple, >> but >> indeed harder to use. >> >> Another possibility is to add a 'tree vla_decl' field to struct >> gimple_statement_call, which is probably the easiest to implement. >> >> But I think __builtin_alloca_with_align might have a use beyond vlas, so I >> decided to try this one. Attached patch implements my first stab at this >> (now >> testing on x86_64). >> >> Is this an acceptable approach? >> > > bootstrapped and reg-tested (including ada) on x86_64. > > Ok for trunk?
The idea is ok I think. But case BUILT_IN_ALLOCA: + case BUILT_IN_ALLOCA_WITH_ALIGN: /* If the allocation stems from the declaration of a variable-sized object, it cannot accumulate. */ target = expand_builtin_alloca (exp, CALL_ALLOCA_FOR_VAR_P (exp)); if (target) return target; + if (DECL_FUNCTION_CODE (get_callee_fndecl (exp)) + == BUILT_IN_ALLOCA_WITH_ALIGN) + { + tree new_call = build_call_expr_loc (EXPR_LOCATION (exp), + built_in_decls[BUILT_IN_ALLOCA], + 1, CALL_EXPR_ARG (exp, 0)); + CALL_ALLOCA_FOR_VAR_P (new_call) = CALL_ALLOCA_FOR_VAR_P (exp); + exp = new_call; + } Ick. Why can't the rest of the compiler not handle BUILT_IN_ALLOCA_WITH_ALIGN the same as BUILT_IN_ALLOCA? (thus, arrange things so the assembler name of alloca-with-align is alloca?) I don't see why you still need the special late CCP pass. -static tree -fold_builtin_alloca_for_var (gimple stmt) +static bool +builtin_alloca_with_align_p (gimple stmt) { - unsigned HOST_WIDE_INT size, threshold, n_elem; - tree lhs, arg, block, var, elem_type, array_type; - unsigned int align; + tree fndecl; + if (!is_gimple_call (stmt)) + return false; + + fndecl = gimple_call_fndecl (stmt); + + return (fndecl != NULL_TREE + && DECL_FUNCTION_CODE (fndecl) == BUILT_IN_ALLOCA_WITH_ALIGN); +} equivalent to gimple_call_builtin_p (stmt, BUILT_IN_ALLOCA_WITH_ALIGN). + DECL_ALIGN (var) = TREE_INT_CST_LOW (gimple_call_arg (stmt, 1)); Now, users might call __builtin_alloca_with_align (8, align), thus use a non-constant alignment argument. You either need to reject this in c-family/c-common.c:check_builtin_function_arguments with an error or fall back to BIGGEST_ALIGNMENT (similarly during propagation I suppose). +DEF_BUILTIN_STUB (BUILT_IN_ALLOCA_WITH_ALIGN, "alloca_with_align") Oh, I see you are not exposing this to users, so ignore the comments about non-constant align. Can you move this DEF down, near the other DEF_BUILTIN_STUBs and add a comment that this is used for VLAs? + build_int_cstu (size_type_node, + DECL_ALIGN (parm))); size_int (DECL_ALIGN (parm)), similar in other places. Let's see if there are comments from others on this - which I like. Thanks, Richard. > Thanks, > - Tom > > 2011-10-03 Tom de Vries <t...@codesourcery.com> > > PR middle-end/50527 > * tree.c (build_common_builtin_nodes): Add local_define_builtin for > BUILT_IN_ALLOCA_WITH_ALIGN. > * builtins.c (expand_builtin_alloca): Handle BUILT_IN_ALLOCA_WITH_ALIGN > arglist. > (expand_builtin): Handle BUILT_IN_ALLOCA_WITH_ALIGN. Expand > BUILT_IN_ALLOCA_WITH_ALIGN as BUILT_IN_ALLOCA. > (is_inexpensive_builtin): Handle BUILT_IN_ALLOCA_WITH_ALIGN. > * tree-pass.h (pass_ccp_last): Declare. > * passes.c (init_optimization_passes): Replace 3rd pass_ccp with > pass_ccp_last. > * gimplify.c (gimplify_vla_decl): Lower vla to > BUILT_IN_ALLOCA_WITH_ALIGN. > * tree-ssa-ccp.c (cpp_last): Define. > (ccp_finalize): Improve align ptr_info for BUILT_IN_ALLOCA_WITH_ALIGN. > (evaluate_stmt): Set align for BUILT_IN_ALLOCA_WITH_ALIGN. > (builtin_alloca_with_align_p): New function. > (fold_builtin_alloca_with_align_p): New function, factored out of ... > (fold_builtin_alloca_for_var): Renamed to ... > (fold_builtin_alloca_with_align): Use fold_builtin_alloca_with_align_p. > Set DECL_ALIGN from BUILT_IN_ALLOCA_WITH_ALIGN argument. > (ccp_fold_stmt): Try folding using fold_builtin_alloca_with_align if > builtin_alloca_with_align_p. > (do_ssa_ccp_last): New function. > (pass_ccp_last): Define. > (optimize_stack_restore): Handle BUILT_IN_ALLOCA_WITH_ALIGN. > * builtins.def (BUILT_IN_ALLOCA_WITH_ALIGN): Declare using > DEF_BUILTIN_STUB. > * ipa-pure-const.c (special_builtin_state): Handle > BUILT_IN_ALLOCA_WITH_ALIGN. > * tree-ssa-alias.c (ref_maybe_used_by_call_p_1 > (call_may_clobber_ref_p_1): Same. > function.c (gimplify_parameters): Lower vla to > BUILT_IN_ALLOCA_WITH_ALIGN. > cfgexpand.c (expand_call_stmt): Handle BUILT_IN_ALLOCA_WITH_ALIGN. > tree-mudflap.c (mf_xform_statements): Same. > tree-ssa-dce.c (mark_stmt_if_obviously_necessary) > (mark_all_reaching_defs_necessary_1, propagate_necessity): Same. > varasm.c (incorporeal_function_p): Same. > tree-object-size.c (alloc_object_size): Same. > gimple.c (gimple_build_call_from_tree): Same. > > * gcc.dg/pr50527.c: New test. > > >