On Tue, Oct 27, 2015 at 06:12:48PM -0500, Evandro Menezes wrote: > This patch adds the scheduling and cost models for Exynos M1. > > Though it?s a rather large patch, much of it is the DFA model for the > pipeline.? Still, I?d appreciate any feedback. > > Please, commit if it?s alright.
Hi Evandro, Thanks for the patch, I have some comments. To ease review, could I ask you to turn this in to a patch series? Roughly structured as so: 1/4: Add the Exynos-M1 cost models. 2/4: Add the Exynos M1 scheduling model. 3/4: Add the infrastructure for TARGET_CASE_VALUES_THRESHOLD. 4/4: Add the extra tuning heuristics. Your support is missing a critical hunk for AArch64, there should be an (include "../arm/exynos-m1.md") in aarch64.md to get this working. This is a fairly large pipeline description (add (automata_option "stats") to the .md file): Automaton `exynos_m1' 62320 NDFA states, 489094 NDFA arcs >From experience, you get little benefit from such a complex model, but you do slow bootstrap times. It isn't for me to say where the model can be trimmed (I don't have access to documentation for the Exynos-M1), but you may find it useful to split out the SIMD/FP automaton, and look at whether your modelling of long latency instructions is entirely neccesary. Have a look at the Cortex-A57 and Cortex-A53 for some examples of what I mean. For comparison, here are the stats for Cortex-A53 and Cortex-A57: Automaton `cortex_a53' 281 NDFA states, 1158 NDFA arcs Automaton `cortex_a53_advsimd' 9072 NDFA states, 49572 NDFA arcs Automaton `cortex_a57' 764 NDFA states, 3600 NDFA arcs Automaton `cortex_a57_cx' 204 NDFA states, 864 NDFA arcs > @@ -7672,6 +7737,22 @@ aarch64_override_options_internal (struct gcc_options > *opts) > opts->x_param_values, > global_options_set.x_param_values); > > + /* Adjust the heuristics for Exynos M1. */ > + if (selected_cpu->sched_core == exynosm1) I think it would be preferable to pull these tuning parameters in to the target structures somehow, rather than guarding them off by specific CPUs. > + { > + /* Increase the maximum peeling limit. */ > + maybe_set_param_value (PARAM_MAX_COMPLETELY_PEELED_INSNS, > + 400, > + opts->x_param_values, > + global_options_set.x_param_values); > + > + /* Set the L1 cache line size. */ > + maybe_set_param_value (PARAM_L1_CACHE_LINE_SIZE, > + 64, > + opts->x_param_values, > + global_options_set.x_param_values); > + } > + > aarch64_override_options_after_change_1 (opts); > } > > @@ -13382,6 +13463,20 @@ aarch64_promoted_type (const_tree t) > return float_type_node; > return NULL_TREE; > } > + > +/* Implement TARGET_CASE_VALUES_THRESHOLD. */ > + > +static unsigned int > +aarch64_case_values_threshold (void) > +{ > + /* For Exynos M1, raise the bar for using jump tables. */ > + if (selected_cpu->sched_core == exynosm1 > + && optimize > 2) > + return 48; Likewise, I think this should end up in the per-core tuning structures rather than masked off by selected_cpu->sched_core == exynosm1. > + else > + return default_case_values_threshold (); > +} > + > #undef TARGET_ADDRESS_COST > #define TARGET_ADDRESS_COST aarch64_address_cost > Thanks, James