Re: [PATCH] Add capability to run several iterations of early optimizations

Maxim Kuvyrkov Fri, 28 Oct 2011 15:22:14 -0700

On 28/10/2011, at 11:43 PM, Richard Guenther wrote:

> On Fri, Oct 28, 2011 at 1:05 AM, Maxim Kuvyrkov <ma...@codesourcery.com> 
> wrote:
>> 
>> OK for trunk?
> 
> diff --git a/gcc/cgraph.c b/gcc/cgraph.c
> index f056d3d..4738b28 100644
> --- a/gcc/cgraph.c
> +++ b/gcc/cgraph.c
> @@ -2416,7 +2416,7 @@ cgraph_add_new_function (tree fndecl, bool lowered)
>            tree_lowering_passes (fndecl);
>            bitmap_obstack_initialize (NULL);
>            if (!gimple_in_ssa_p (DECL_STRUCT_FUNCTION (fndecl)))
> -             execute_pass_list (pass_early_local_passes.pass.sub);
> +             execute_early_local_passes_for_current_function ();
>            bitmap_obstack_release (NULL);
>            pop_cfun ();
>            current_function_decl = NULL;
> @@ -2441,7 +2441,7 @@ cgraph_add_new_function (tree fndecl, bool lowered)
>        gimple_register_cfg_hooks ();
>        bitmap_obstack_initialize (NULL);
>        if (!gimple_in_ssa_p (DECL_STRUCT_FUNCTION (fndecl)))
> -         execute_pass_list (pass_early_local_passes.pass.sub);
> +         execute_early_local_passes_for_current_function ();
> 
> I think these should only execute the lowering pieces of early local passes,
> let me see if that's properly split ...
> 
> @@ -255,7 +255,7 @@ cgraph_process_new_functions (void)
>              /* When not optimizing, be sure we run early local passes anyway
>                 to expand OMP.  */
>              || !optimize)
> -           execute_pass_list (pass_early_local_passes.pass.sub);
> +           execute_early_local_passes_for_current_function ();
> 
> similar.


In the case of tree_lowering_passes, I agree.  But with the calls to 
early_local_passes from cgraph_add_new_function/cgraph_process_new_functions we 
ought to run the full list of optimizations to make bodies of new functions 
ready for inlining to existing functions.  Note, cgraph_add_new_function() 
explicitly calls tree_lowering_passes and then invokes early_local_passes.

What do you think?  If we want to pursue this, it should be in a separate patch 
anyway.

> 
> About all this -suffix stuff, I'd like to have the iterations simply re-use
> the existing dump-files, thus statically sub-divide pass_early_local_passes
> like

I considered this when I was implementing and debugging iterative support, and 
it is better to keep dumps for main and iterative parts separate.  These are 
two different IPA passes, and their sub-passes should use separate dumps.

Given that the set of sub-passes of early_local_passes_main is substantially 
different from the set of sub-passes of early_local_passes_iter (the former has 
a bunch of init/expand/lower passes in it), mixing the dumps will be mighty 
confusing.  Without knowing the exact ordering of passes one will not be able 
to construct a chronological picture of optimizations.  Separating dumps of 
sub-passes of different IPA passes simplifies understanding of optimization 
ordering.

> 
> NEXT_PASS (pass_early_local_lowering_passes);
>  {
>      NEXT_PASS (pass_fixup_cfg);
>      NEXT_PASS (pass_init_datastructures);
>      NEXT_PASS (pass_expand_omp);
> 
>      NEXT_PASS (pass_referenced_vars);
>      NEXT_PASS (pass_build_ssa);
>      NEXT_PASS (pass_lower_vector);
>      NEXT_PASS (pass_early_warn_uninitialized);
>  }
> /* The following you maybe iterate.  */
> NEXT_PASS (pass_early_local_optimizations);
>  {
>      NEXT_PASS (pass_rebuild_cgraph_edges);
>      NEXT_PASS (pass_inline_parameters);
>      NEXT_PASS (pass_early_inline);
>      NEXT_PASS (pass_all_early_optimizations);
>        {
>          struct opt_pass **p = &pass_all_early_optimizations.pass.sub;
>          NEXT_PASS (pass_remove_cgraph_callee_edges);
>          NEXT_PASS (pass_rename_ssa_copies);
>          NEXT_PASS (pass_ccp);
>          NEXT_PASS (pass_forwprop);
>          /* pass_build_ealias is a dummy pass that ensures that we
>             execute TODO_rebuild_alias at this point.  Re-building
>             alias information also rewrites no longer addressed
>             locals into SSA form if possible.  */
>          NEXT_PASS (pass_build_ealias);
>          NEXT_PASS (pass_sra_early);
>          NEXT_PASS (pass_fre);
>          NEXT_PASS (pass_copy_prop);
>          NEXT_PASS (pass_merge_phi);
>          NEXT_PASS (pass_cd_dce);
>          NEXT_PASS (pass_early_ipa_sra);
>          NEXT_PASS (pass_tail_recursion);
>          NEXT_PASS (pass_convert_switch);
>          NEXT_PASS (pass_cleanup_eh);
>          NEXT_PASS (pass_profile);
>          NEXT_PASS (pass_local_pure_const);
>       }
>   }
> NEXT_PASS (pass_late_early_local_optimizations)
>  {
>          /* Split functions creates parts that are not run through
>             early optimizations again.  It is thus good idea to do this
>             late.  */
>          NEXT_PASS (pass_split_functions);
>      NEXT_PASS (pass_release_ssa_names);
>      NEXT_PASS (pass_rebuild_cgraph_edges);
>      NEXT_PASS (pass_inline_parameters);
>  }

This simple division into 3 stages will not work.  On one hand, one needs to 
add fixup/cleanup passes at the beginning and end of every SIMPLE_IPA pass; 
without those cgraph_verify() gets upset very easily.  On the other hand, when 
iterative optimizations are not enabled, and there is no need to decouple main 
and late parts, these additional fixup/cleanup passes would be unnecessary 
overhead.

Also, in the above division you effectively make 3 separate SIMPLE_IPA passes, 
and I don't know what the effect of that will be.

The pass formation and scheduling in the patch I posted may not look as pretty 
as what you describe above, but it is reasonably clean and easy to maintain.  
One needs to change only passes in the main part, and those changes will be 
automatically picked up in iterative and late parts.

> 
> +++ b/gcc/toplev.c
> @@ -1228,7 +1228,6 @@ general_init (const char *argv0)
>   /* This must be done after global_init_params but before argument
>      processing.  */
>   init_ggc_heuristics();
> -  init_optimization_passes ();
>   statistics_early_init ();
>   finish_params ();
> }
> @@ -1989,6 +1988,8 @@ toplev_main (int argc, char **argv)
>                  save_decoded_options, save_decoded_options_count,
>                  UNKNOWN_LOCATION, global_dc);
> 
> +  init_optimization_passes ();
> +
>   handle_common_deferred_options ();
> 
>   init_local_tick ();
> 
> any reason for this?

Yes, this moves init_optimization_passes() after processing of --param options. 
 This is needed to be able to use PARAM_VALUE in passes initialization.

> 
> +  /* Don't recurse or wonder on to the next pass when running
> +     execute_ipa_pass_list below.  */
> +  current->execute = NULL;
> +  current->next = NULL;
> 
> that looks awkward ;)  Shouldn't instead
> 
> +    execute_ipa_pass_list (current);
> 
> not do what it does?  Thus, split that into two pieces, one that we
> can iterate w/o the fiddling with current->?

The choice is between the above and duplicating post-processing logic of 
execute_ipa_pass_list (i.e., call to cgraph_process_new_functions) in the 
iteration loop.  I agree that the above is a tad awkward, but it is cleaner 
than copy-paste of the final part of execute_ipa_pass_list.

> 
> I like this variant a lot better than the last one - still it lacks any
> analysis-based justification for iteration (see my reply to Matt on
> what I discussed with Honza).

Yes, having a way to tell whether a function have significantly changed would 
be awesome.  My approach here would be to make inline_parameters output 
feedback of how much the size/time metrics have changed for a function since 
previous run.  If the change is above X%, then queue functions callers for more 
optimizations.  Similarly, Martin's rebuild_cgraph_edges_and_devirt (when that 
goes into trunk) could queue new direct callees and current function for 
another iteration if new direct edges were resolved.

But such analysis is way out of scope of this patch, which adds infrastructure 
for iterative optimizations to be possible and reliable.

>  Thus, I don't think we want to
> merge this in its current form or in this stage1.

What is the benefit of pushing this to a later release?  If anything, merging 
the support for iterative optimizations now will allow us to consider adding 
the wonderful smartness to it later.  In the meantime, substituting that 
smartness with a knob is still a great alternative.

Thank you,

--
Maxim Kuvyrkov
CodeSourcery / Mentor Graphics

Re: [PATCH] Add capability to run several iterations of early optimizations

Reply via email to