On Tue, Oct 27, 2015 at 06:12:48PM -0500, Evandro Menezes wrote:
This patch adds the scheduling and cost models for Exynos M1.
Though it?s a rather large patch, much of it is the DFA model for the
pipeline.? Still, I?d appreciate any feedback.
Please, commit if it?s alright.
Hi Evandro,
Thanks for the patch, I have some comments.
To ease review, could I ask you to turn this in to a patch series? Roughly
structured as so:
1/4: Add the Exynos-M1 cost models.
2/4: Add the Exynos M1 scheduling model.
3/4: Add the infrastructure for TARGET_CASE_VALUES_THRESHOLD.
4/4: Add the extra tuning heuristics.
Your support is missing a critical hunk for AArch64, there should be an
(include "../arm/exynos-m1.md")
in aarch64.md to get this working.
This is a fairly large pipeline description (add (automata_option "stats")
to the .md file):
Automaton `exynos_m1'
62320 NDFA states, 489094 NDFA arcs
From experience, you get little benefit from such a complex model, but you
do slow bootstrap times. It isn't for me to say where the model can be
trimmed (I don't have access to documentation for the Exynos-M1), but
you may find it useful to split out the SIMD/FP automaton, and look at whether
your modelling of long latency instructions is entirely neccesary. Have a
look at the Cortex-A57 and Cortex-A53 for some examples of what I mean.
For comparison, here are the stats for Cortex-A53 and Cortex-A57:
Automaton `cortex_a53'
281 NDFA states, 1158 NDFA arcs
Automaton `cortex_a53_advsimd'
9072 NDFA states, 49572 NDFA arcs
Automaton `cortex_a57'
764 NDFA states, 3600 NDFA arcs
Automaton `cortex_a57_cx'
204 NDFA states, 864 NDFA arcs
@@ -7672,6 +7737,22 @@ aarch64_override_options_internal (struct gcc_options
*opts)
opts->x_param_values,
global_options_set.x_param_values);
+ /* Adjust the heuristics for Exynos M1. */
+ if (selected_cpu->sched_core == exynosm1)
I think it would be preferable to pull these tuning parameters in to
the target structures somehow, rather than guarding them off by specific
CPUs.
+ {
+ /* Increase the maximum peeling limit. */
+ maybe_set_param_value (PARAM_MAX_COMPLETELY_PEELED_INSNS,
+ 400,
+ opts->x_param_values,
+ global_options_set.x_param_values);
+
+ /* Set the L1 cache line size. */
+ maybe_set_param_value (PARAM_L1_CACHE_LINE_SIZE,
+ 64,
+ opts->x_param_values,
+ global_options_set.x_param_values);
+ }
+
aarch64_override_options_after_change_1 (opts);
}
@@ -13382,6 +13463,20 @@ aarch64_promoted_type (const_tree t)
return float_type_node;
return NULL_TREE;
}
+
+/* Implement TARGET_CASE_VALUES_THRESHOLD. */
+
+static unsigned int
+aarch64_case_values_threshold (void)
+{
+ /* For Exynos M1, raise the bar for using jump tables. */
+ if (selected_cpu->sched_core == exynosm1
+ && optimize > 2)
+ return 48;
Likewise, I think this should end up in the per-core tuning structures
rather than masked off by selected_cpu->sched_core == exynosm1.
+ else
+ return default_case_values_threshold ();
+}
+
#undef TARGET_ADDRESS_COST
#define TARGET_ADDRESS_COST aarch64_address_cost
Thanks,
James