ping * 3 https://gcc.gnu.org/ml/gcc-patches/2016-06/msg01703.html

Thanks,
Prathamesh

On 5 July 2016 at 10:53, Prathamesh Kulkarni
<prathamesh.kulka...@linaro.org> wrote:
> ping * 2 ping https://gcc.gnu.org/ml/gcc-patches/2016-06/msg01703.html
>
> Thanks,
> Prathamesh
>
> On 28 June 2016 at 14:49, Prathamesh Kulkarni
> <prathamesh.kulka...@linaro.org> wrote:
>> ping https://gcc.gnu.org/ml/gcc-patches/2016-06/msg01703.html
>>
>> Thanks,
>> Prathamesh
>>
>> On 23 June 2016 at 22:51, Prathamesh Kulkarni
>> <prathamesh.kulka...@linaro.org> wrote:
>>> On 17 June 2016 at 19:52, Prathamesh Kulkarni
>>> <prathamesh.kulka...@linaro.org> wrote:
>>>> On 14 June 2016 at 18:31, Prathamesh Kulkarni
>>>> <prathamesh.kulka...@linaro.org> wrote:
>>>>> On 13 June 2016 at 16:13, Jan Hubicka <hubi...@ucw.cz> wrote:
>>>>>>> diff --git a/gcc/cgraph.h b/gcc/cgraph.h
>>>>>>> index ecafe63..41ac408 100644
>>>>>>> --- a/gcc/cgraph.h
>>>>>>> +++ b/gcc/cgraph.h
>>>>>>> @@ -1874,6 +1874,9 @@ public:
>>>>>>>       if we did not do any inter-procedural code movement.  */
>>>>>>>    unsigned used_by_single_function : 1;
>>>>>>>
>>>>>>> +  /* Set if -fsection-anchors is set.  */
>>>>>>> +  unsigned section_anchor : 1;
>>>>>>> +
>>>>>>>  private:
>>>>>>>    /* Assemble thunks and aliases associated to varpool node.  */
>>>>>>>    void assemble_aliases (void);
>>>>>>> diff --git a/gcc/cgraphunit.c b/gcc/cgraphunit.c
>>>>>>> index 4bfcad7..e75d5c0 100644
>>>>>>> --- a/gcc/cgraphunit.c
>>>>>>> +++ b/gcc/cgraphunit.c
>>>>>>> @@ -800,6 +800,9 @@ varpool_node::finalize_decl (tree decl)
>>>>>>>       it is available to notice_global_symbol.  */
>>>>>>>    node->definition = true;
>>>>>>>    notice_global_symbol (decl);
>>>>>>> +
>>>>>>> +  node->section_anchor = flag_section_anchors;
>>>>>>> +
>>>>>>>    if (TREE_THIS_VOLATILE (decl) || DECL_PRESERVE_P (decl)
>>>>>>>        /* Traditionally we do not eliminate static variables when not
>>>>>>>        optimizing and when not doing toplevel reoder.  */
>>>>>>> diff --git a/gcc/common.opt b/gcc/common.opt
>>>>>>> index f0d7196..e497795 100644
>>>>>>> --- a/gcc/common.opt
>>>>>>> +++ b/gcc/common.opt
>>>>>>> @@ -1590,6 +1590,10 @@ fira-algorithm=
>>>>>>>  Common Joined RejectNegative Enum(ira_algorithm) 
>>>>>>> Var(flag_ira_algorithm) Init(IRA_ALGORITHM_CB) Optimization
>>>>>>>  -fira-algorithm=[CB|priority] Set the used IRA algorithm.
>>>>>>>
>>>>>>> +fipa-increase_alignment
>>>>>>> +Common Report Var(flag_ipa_increase_alignment) Init(0) Optimization
>>>>>>> +Option to gate increase_alignment ipa pass.
>>>>>>> +
>>>>>>>  Enum
>>>>>>>  Name(ira_algorithm) Type(enum ira_algorithm) UnknownError(unknown IRA 
>>>>>>> algorithm %qs)
>>>>>>>
>>>>>>> @@ -2133,7 +2137,7 @@ Common Report Var(flag_sched_dep_count_heuristic) 
>>>>>>> Init(1) Optimization
>>>>>>>  Enable the dependent count heuristic in the scheduler.
>>>>>>>
>>>>>>>  fsection-anchors
>>>>>>> -Common Report Var(flag_section_anchors) Optimization
>>>>>>> +Common Report Var(flag_section_anchors)
>>>>>>>  Access data in the same section from shared anchor points.
>>>>>>>
>>>>>>>  fsee
>>>>>>> diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
>>>>>>> index a0db3a4..1482566 100644
>>>>>>> --- a/gcc/config/aarch64/aarch64.c
>>>>>>> +++ b/gcc/config/aarch64/aarch64.c
>>>>>>> @@ -8252,6 +8252,8 @@ aarch64_override_options (void)
>>>>>>>
>>>>>>>    aarch64_register_fma_steering ();
>>>>>>>
>>>>>>> +  /* Enable increase_alignment pass.  */
>>>>>>> +  flag_ipa_increase_alignment = 1;
>>>>>>
>>>>>> I would rather enable it always on targets that do support anchors.
>>>>> AFAIK aarch64 supports section anchors.
>>>>>>> diff --git a/gcc/lto/lto-symtab.c b/gcc/lto/lto-symtab.c
>>>>>>> index ce9e146..7f09f3a 100644
>>>>>>> --- a/gcc/lto/lto-symtab.c
>>>>>>> +++ b/gcc/lto/lto-symtab.c
>>>>>>> @@ -342,6 +342,13 @@ lto_symtab_merge (symtab_node *prevailing, 
>>>>>>> symtab_node *entry)
>>>>>>>       The type compatibility checks or the completing of types has 
>>>>>>> properly
>>>>>>>       dealt with most issues.  */
>>>>>>>
>>>>>>> +  /* ??? is this assert necessary ?  */
>>>>>>> +  varpool_node *v_prevailing = dyn_cast<varpool_node *> (prevailing);
>>>>>>> +  varpool_node *v_entry = dyn_cast<varpool_node *> (entry);
>>>>>>> +  gcc_assert (v_prevailing && v_entry);
>>>>>>> +  /* section_anchor of prevailing_decl wins.  */
>>>>>>> +  v_entry->section_anchor = v_prevailing->section_anchor;
>>>>>>> +
>>>>>> Other flags are merged in lto_varpool_replace_node so please move this 
>>>>>> there.
>>>>> Ah indeed, thanks for the pointers.
>>>>> I wonder though if we need to set
>>>>> prevailing_node->section_anchor = vnode->section_anchor ?
>>>>> IIUC, the function merges flags from vnode into prevailing_node
>>>>> and removes vnode. However we want prevailing_node->section_anchor
>>>>> to always take precedence.
>>>>>>> +/* Return true if alignment should be increased for this vnode.
>>>>>>> +   This is done if every function that references/referring to vnode
>>>>>>> +   has flag_tree_loop_vectorize set.  */
>>>>>>> +
>>>>>>> +static bool
>>>>>>> +increase_alignment_p (varpool_node *vnode)
>>>>>>> +{
>>>>>>> +  ipa_ref *ref;
>>>>>>> +
>>>>>>> +  for (int i = 0; vnode->iterate_reference (i, ref); i++)
>>>>>>> +    if (cgraph_node *cnode = dyn_cast<cgraph_node *> (ref->referred))
>>>>>>> +      {
>>>>>>> +     struct cl_optimization *opts = opts_for_fn (cnode->decl);
>>>>>>> +     if (!opts->x_flag_tree_loop_vectorize)
>>>>>>> +       return false;
>>>>>>> +      }
>>>>>>
>>>>>> If you take address of function that has vectorizer enabled probably 
>>>>>> doesn't
>>>>>> imply need to increase alignment of that var. So please drop the loop.
>>>>>>
>>>>>> You only want function that read/writes or takes address of the symbol. 
>>>>>> But
>>>>>> onthe other hand, you need to walk all aliases of the symbol by
>>>>>> call_for_symbol_and_aliases
>>>>>>> +
>>>>>>> +  for (int i = 0; vnode->iterate_referring (i, ref); i++)
>>>>>>> +    if (cgraph_node *cnode = dyn_cast<cgraph_node *> (ref->referring))
>>>>>>> +      {
>>>>>>> +     struct cl_optimization *opts = opts_for_fn (cnode->decl);
>>>>>>> +     if (!opts->x_flag_tree_loop_vectorize)
>>>>>>> +       return false;
>>>>>>> +      }
>>>>>>> +
>>>>>>> +  return true;
>>>>>>> +}
>>>>>>> +
>>>>>>>  /* Entry point to increase_alignment pass.  */
>>>>>>>  static unsigned int
>>>>>>>  increase_alignment (void)
>>>>>>> @@ -914,9 +942,12 @@ increase_alignment (void)
>>>>>>>        tree decl = vnode->decl;
>>>>>>>        unsigned int alignment;
>>>>>>>
>>>>>>> -      if ((decl_in_symtab_p (decl)
>>>>>>> -       && !symtab_node::get (decl)->can_increase_alignment_p ())
>>>>>>> -       || DECL_USER_ALIGN (decl) || DECL_ARTIFICIAL (decl))
>>>>>>> +      if (!vnode->section_anchor
>>>>>>> +       || (decl_in_symtab_p (decl)
>>>>>>> +           && !symtab_node::get (decl)->can_increase_alignment_p ())
>>>>>>> +       || DECL_USER_ALIGN (decl)
>>>>>>> +       || DECL_ARTIFICIAL (decl)
>>>>>>> +       || !increase_alignment_p (vnode))
>>>>>>
>>>>>> Incrementally we probably should do more testing whether the variable 
>>>>>> looks like
>>>>>> someting that can be vectorized, i.e. it contains array, has address 
>>>>>> taken or the
>>>>>> accesses are array accesses within loop.
>>>>>> This can be done by the analysis phase of the IPA pass inspecting the 
>>>>>> function
>>>>>> bodies.
>>>>> Thanks, I will try to check for array accesses are within a loop in
>>>>> followup patch.
>>>>> I was wondering if we could we treat a homogeneous global struct
>>>>> (having members of one type),
>>>>> as a global array of that type and increase it's alignment if required ?
>>>>>>
>>>>>> I think it is important waste to bump up everything including error 
>>>>>> messages etc.
>>>>>> At least on i386 the effect on firefox datasegment of various alignment 
>>>>>> setting is
>>>>>> very visible.
>>>>> Um for a start, would it be OK to check if all functions referencing 
>>>>> variable
>>>>> have attribute noreturn, and in that case we skip increasing the 
>>>>> alignment ?
>>>>> I suppose that error functions would be having attribute noreturn set ?
>>>>>>
>>>>>> Looks OK to me otherwise. please send updated patch.
>>>>> I have done the changes in the attached patch (stage-1 built).
>>>>> I am not sure what to return from the callback function and
>>>>> arbitrarily chose to return true.
>>>> Hi,
>>>> In this version (stage-1 built), I added read/write summaries which
>>>> stream those variables
>>>> which we want to increase alignment for.
>>>>
>>>> I have a few questions:
>>>>
>>>> a) To check if global var is used inside a loop, I obtained
>>>> corresponding ipa_ref
>>>> and checked loop_containing_stmt (ref->stmt), however it returned non-NULL
>>>> even when ref->stmt was not inside a loop.
>>>> for instance:
>>>> int a[32];
>>>> int f() { int x = a[0]; return x; }
>>>> Um how to check if stmt is used inside a loop ?
>>>>
>>>> b) Is it necessary during WPA to check if function has
>>>> flag_tree_vectorize_set since
>>>> during analysis phase in increase_alignment_generate_summary() I check
>>>> if cnode has flag_tree_loop_vectorize_set ?
>>>>
>>>> c) In LTO_increase_alignment_section, the following is streamed:
>>>> i) a "count" of type uwhi, to represent number of variables
>>>> ii) decls corresponding to the variables
>>>> The variables are then obtained during read_summay using
>>>> symtab_node::get_create().
>>>> I suppose since decls for varpool_nodes would already be streamed in
>>>> LTO_section_decls, I was wondering if I
>>>> could somehow refer to the decls in that section to avoid duplicate
>>>> streaming of decls ?
>>> Hi,
>>> In this version, the variable is streamed if it occurs within a loop
>>> or it's address is taken.
>>> To check if stmt is inside a loop I am using:
>>>
>>> struct loop *l = loop_containing_stmt (ref->stmt);
>>> if (l != DECL_STRUCT_FUNCTION (cnode->decl)->x_current_loops->tree_root)
>>>   vars->add (vnode);
>>> Is this correct ?
>>>
>>> I came across an unexpected issue in my previous patch with 
>>> -ffat-lto-objects.
>>> I am allocating vars = new hash_set<varpool_node *> () in
>>> generate_summary() and freeing it in write_summary().
>>> However with -ffat-lto-objects, it appears execute() gets called after
>>> write_summary()
>>> during LGEN and we hit segfault in execute() at:
>>> for (hash_set<varpool_node *>::iterator it = vars.begin (); it !=
>>> vars.end (); it++)
>>>   { ... }
>>> because write_summary() has freed vars.
>>> To workaround the issue, I gated call to free vars in write_summary on
>>> flag_fat_lto_objects,
>>> I am not sure if that's a good solution.
>>>
>>> Cross tested on arm*-*-*, aarch64*-*-*.
>>>
>>> Thanks,
>>> Prathamesh
>>>>
>>>> Thanks,
>>>> Prathamesh
>>>>>
>>>>> Thanks,
>>>>> Prathamesh
>>>>>>
>>>>>> Honza

Reply via email to