ping * 3 https://gcc.gnu.org/ml/gcc-patches/2016-06/msg01703.html
Thanks, Prathamesh On 5 July 2016 at 10:53, Prathamesh Kulkarni <prathamesh.kulka...@linaro.org> wrote: > ping * 2 ping https://gcc.gnu.org/ml/gcc-patches/2016-06/msg01703.html > > Thanks, > Prathamesh > > On 28 June 2016 at 14:49, Prathamesh Kulkarni > <prathamesh.kulka...@linaro.org> wrote: >> ping https://gcc.gnu.org/ml/gcc-patches/2016-06/msg01703.html >> >> Thanks, >> Prathamesh >> >> On 23 June 2016 at 22:51, Prathamesh Kulkarni >> <prathamesh.kulka...@linaro.org> wrote: >>> On 17 June 2016 at 19:52, Prathamesh Kulkarni >>> <prathamesh.kulka...@linaro.org> wrote: >>>> On 14 June 2016 at 18:31, Prathamesh Kulkarni >>>> <prathamesh.kulka...@linaro.org> wrote: >>>>> On 13 June 2016 at 16:13, Jan Hubicka <hubi...@ucw.cz> wrote: >>>>>>> diff --git a/gcc/cgraph.h b/gcc/cgraph.h >>>>>>> index ecafe63..41ac408 100644 >>>>>>> --- a/gcc/cgraph.h >>>>>>> +++ b/gcc/cgraph.h >>>>>>> @@ -1874,6 +1874,9 @@ public: >>>>>>> if we did not do any inter-procedural code movement. */ >>>>>>> unsigned used_by_single_function : 1; >>>>>>> >>>>>>> + /* Set if -fsection-anchors is set. */ >>>>>>> + unsigned section_anchor : 1; >>>>>>> + >>>>>>> private: >>>>>>> /* Assemble thunks and aliases associated to varpool node. */ >>>>>>> void assemble_aliases (void); >>>>>>> diff --git a/gcc/cgraphunit.c b/gcc/cgraphunit.c >>>>>>> index 4bfcad7..e75d5c0 100644 >>>>>>> --- a/gcc/cgraphunit.c >>>>>>> +++ b/gcc/cgraphunit.c >>>>>>> @@ -800,6 +800,9 @@ varpool_node::finalize_decl (tree decl) >>>>>>> it is available to notice_global_symbol. */ >>>>>>> node->definition = true; >>>>>>> notice_global_symbol (decl); >>>>>>> + >>>>>>> + node->section_anchor = flag_section_anchors; >>>>>>> + >>>>>>> if (TREE_THIS_VOLATILE (decl) || DECL_PRESERVE_P (decl) >>>>>>> /* Traditionally we do not eliminate static variables when not >>>>>>> optimizing and when not doing toplevel reoder. */ >>>>>>> diff --git a/gcc/common.opt b/gcc/common.opt >>>>>>> index f0d7196..e497795 100644 >>>>>>> --- a/gcc/common.opt >>>>>>> +++ b/gcc/common.opt >>>>>>> @@ -1590,6 +1590,10 @@ fira-algorithm= >>>>>>> Common Joined RejectNegative Enum(ira_algorithm) >>>>>>> Var(flag_ira_algorithm) Init(IRA_ALGORITHM_CB) Optimization >>>>>>> -fira-algorithm=[CB|priority] Set the used IRA algorithm. >>>>>>> >>>>>>> +fipa-increase_alignment >>>>>>> +Common Report Var(flag_ipa_increase_alignment) Init(0) Optimization >>>>>>> +Option to gate increase_alignment ipa pass. >>>>>>> + >>>>>>> Enum >>>>>>> Name(ira_algorithm) Type(enum ira_algorithm) UnknownError(unknown IRA >>>>>>> algorithm %qs) >>>>>>> >>>>>>> @@ -2133,7 +2137,7 @@ Common Report Var(flag_sched_dep_count_heuristic) >>>>>>> Init(1) Optimization >>>>>>> Enable the dependent count heuristic in the scheduler. >>>>>>> >>>>>>> fsection-anchors >>>>>>> -Common Report Var(flag_section_anchors) Optimization >>>>>>> +Common Report Var(flag_section_anchors) >>>>>>> Access data in the same section from shared anchor points. >>>>>>> >>>>>>> fsee >>>>>>> diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c >>>>>>> index a0db3a4..1482566 100644 >>>>>>> --- a/gcc/config/aarch64/aarch64.c >>>>>>> +++ b/gcc/config/aarch64/aarch64.c >>>>>>> @@ -8252,6 +8252,8 @@ aarch64_override_options (void) >>>>>>> >>>>>>> aarch64_register_fma_steering (); >>>>>>> >>>>>>> + /* Enable increase_alignment pass. */ >>>>>>> + flag_ipa_increase_alignment = 1; >>>>>> >>>>>> I would rather enable it always on targets that do support anchors. >>>>> AFAIK aarch64 supports section anchors. >>>>>>> diff --git a/gcc/lto/lto-symtab.c b/gcc/lto/lto-symtab.c >>>>>>> index ce9e146..7f09f3a 100644 >>>>>>> --- a/gcc/lto/lto-symtab.c >>>>>>> +++ b/gcc/lto/lto-symtab.c >>>>>>> @@ -342,6 +342,13 @@ lto_symtab_merge (symtab_node *prevailing, >>>>>>> symtab_node *entry) >>>>>>> The type compatibility checks or the completing of types has >>>>>>> properly >>>>>>> dealt with most issues. */ >>>>>>> >>>>>>> + /* ??? is this assert necessary ? */ >>>>>>> + varpool_node *v_prevailing = dyn_cast<varpool_node *> (prevailing); >>>>>>> + varpool_node *v_entry = dyn_cast<varpool_node *> (entry); >>>>>>> + gcc_assert (v_prevailing && v_entry); >>>>>>> + /* section_anchor of prevailing_decl wins. */ >>>>>>> + v_entry->section_anchor = v_prevailing->section_anchor; >>>>>>> + >>>>>> Other flags are merged in lto_varpool_replace_node so please move this >>>>>> there. >>>>> Ah indeed, thanks for the pointers. >>>>> I wonder though if we need to set >>>>> prevailing_node->section_anchor = vnode->section_anchor ? >>>>> IIUC, the function merges flags from vnode into prevailing_node >>>>> and removes vnode. However we want prevailing_node->section_anchor >>>>> to always take precedence. >>>>>>> +/* Return true if alignment should be increased for this vnode. >>>>>>> + This is done if every function that references/referring to vnode >>>>>>> + has flag_tree_loop_vectorize set. */ >>>>>>> + >>>>>>> +static bool >>>>>>> +increase_alignment_p (varpool_node *vnode) >>>>>>> +{ >>>>>>> + ipa_ref *ref; >>>>>>> + >>>>>>> + for (int i = 0; vnode->iterate_reference (i, ref); i++) >>>>>>> + if (cgraph_node *cnode = dyn_cast<cgraph_node *> (ref->referred)) >>>>>>> + { >>>>>>> + struct cl_optimization *opts = opts_for_fn (cnode->decl); >>>>>>> + if (!opts->x_flag_tree_loop_vectorize) >>>>>>> + return false; >>>>>>> + } >>>>>> >>>>>> If you take address of function that has vectorizer enabled probably >>>>>> doesn't >>>>>> imply need to increase alignment of that var. So please drop the loop. >>>>>> >>>>>> You only want function that read/writes or takes address of the symbol. >>>>>> But >>>>>> onthe other hand, you need to walk all aliases of the symbol by >>>>>> call_for_symbol_and_aliases >>>>>>> + >>>>>>> + for (int i = 0; vnode->iterate_referring (i, ref); i++) >>>>>>> + if (cgraph_node *cnode = dyn_cast<cgraph_node *> (ref->referring)) >>>>>>> + { >>>>>>> + struct cl_optimization *opts = opts_for_fn (cnode->decl); >>>>>>> + if (!opts->x_flag_tree_loop_vectorize) >>>>>>> + return false; >>>>>>> + } >>>>>>> + >>>>>>> + return true; >>>>>>> +} >>>>>>> + >>>>>>> /* Entry point to increase_alignment pass. */ >>>>>>> static unsigned int >>>>>>> increase_alignment (void) >>>>>>> @@ -914,9 +942,12 @@ increase_alignment (void) >>>>>>> tree decl = vnode->decl; >>>>>>> unsigned int alignment; >>>>>>> >>>>>>> - if ((decl_in_symtab_p (decl) >>>>>>> - && !symtab_node::get (decl)->can_increase_alignment_p ()) >>>>>>> - || DECL_USER_ALIGN (decl) || DECL_ARTIFICIAL (decl)) >>>>>>> + if (!vnode->section_anchor >>>>>>> + || (decl_in_symtab_p (decl) >>>>>>> + && !symtab_node::get (decl)->can_increase_alignment_p ()) >>>>>>> + || DECL_USER_ALIGN (decl) >>>>>>> + || DECL_ARTIFICIAL (decl) >>>>>>> + || !increase_alignment_p (vnode)) >>>>>> >>>>>> Incrementally we probably should do more testing whether the variable >>>>>> looks like >>>>>> someting that can be vectorized, i.e. it contains array, has address >>>>>> taken or the >>>>>> accesses are array accesses within loop. >>>>>> This can be done by the analysis phase of the IPA pass inspecting the >>>>>> function >>>>>> bodies. >>>>> Thanks, I will try to check for array accesses are within a loop in >>>>> followup patch. >>>>> I was wondering if we could we treat a homogeneous global struct >>>>> (having members of one type), >>>>> as a global array of that type and increase it's alignment if required ? >>>>>> >>>>>> I think it is important waste to bump up everything including error >>>>>> messages etc. >>>>>> At least on i386 the effect on firefox datasegment of various alignment >>>>>> setting is >>>>>> very visible. >>>>> Um for a start, would it be OK to check if all functions referencing >>>>> variable >>>>> have attribute noreturn, and in that case we skip increasing the >>>>> alignment ? >>>>> I suppose that error functions would be having attribute noreturn set ? >>>>>> >>>>>> Looks OK to me otherwise. please send updated patch. >>>>> I have done the changes in the attached patch (stage-1 built). >>>>> I am not sure what to return from the callback function and >>>>> arbitrarily chose to return true. >>>> Hi, >>>> In this version (stage-1 built), I added read/write summaries which >>>> stream those variables >>>> which we want to increase alignment for. >>>> >>>> I have a few questions: >>>> >>>> a) To check if global var is used inside a loop, I obtained >>>> corresponding ipa_ref >>>> and checked loop_containing_stmt (ref->stmt), however it returned non-NULL >>>> even when ref->stmt was not inside a loop. >>>> for instance: >>>> int a[32]; >>>> int f() { int x = a[0]; return x; } >>>> Um how to check if stmt is used inside a loop ? >>>> >>>> b) Is it necessary during WPA to check if function has >>>> flag_tree_vectorize_set since >>>> during analysis phase in increase_alignment_generate_summary() I check >>>> if cnode has flag_tree_loop_vectorize_set ? >>>> >>>> c) In LTO_increase_alignment_section, the following is streamed: >>>> i) a "count" of type uwhi, to represent number of variables >>>> ii) decls corresponding to the variables >>>> The variables are then obtained during read_summay using >>>> symtab_node::get_create(). >>>> I suppose since decls for varpool_nodes would already be streamed in >>>> LTO_section_decls, I was wondering if I >>>> could somehow refer to the decls in that section to avoid duplicate >>>> streaming of decls ? >>> Hi, >>> In this version, the variable is streamed if it occurs within a loop >>> or it's address is taken. >>> To check if stmt is inside a loop I am using: >>> >>> struct loop *l = loop_containing_stmt (ref->stmt); >>> if (l != DECL_STRUCT_FUNCTION (cnode->decl)->x_current_loops->tree_root) >>> vars->add (vnode); >>> Is this correct ? >>> >>> I came across an unexpected issue in my previous patch with >>> -ffat-lto-objects. >>> I am allocating vars = new hash_set<varpool_node *> () in >>> generate_summary() and freeing it in write_summary(). >>> However with -ffat-lto-objects, it appears execute() gets called after >>> write_summary() >>> during LGEN and we hit segfault in execute() at: >>> for (hash_set<varpool_node *>::iterator it = vars.begin (); it != >>> vars.end (); it++) >>> { ... } >>> because write_summary() has freed vars. >>> To workaround the issue, I gated call to free vars in write_summary on >>> flag_fat_lto_objects, >>> I am not sure if that's a good solution. >>> >>> Cross tested on arm*-*-*, aarch64*-*-*. >>> >>> Thanks, >>> Prathamesh >>>> >>>> Thanks, >>>> Prathamesh >>>>> >>>>> Thanks, >>>>> Prathamesh >>>>>> >>>>>> Honza