Re: [AArch64] Add scheduling and cost models for Exynos M1

Evandro Menezes Thu, 29 Oct 2015 15:33:13 -0700

Hi, James.

On 10/28/2015 05:36 AM, James Greenhalgh wrote:

On Tue, Oct 27, 2015 at 06:12:48PM -0500, Evandro Menezes wrote:

This patch adds the scheduling and cost models for Exynos M1.


Though it?s a rather large patch, much of it is the DFA model for the
pipeline.? Still, I?d appreciate any feedback.

Please, commit if it?s alright.

Hi Evandro,

Thanks for the patch, I have some comments.

To ease review, could I ask you to turn this in to a patch series? Roughly
structured as so:

   1/4: Add the Exynos-M1 cost models.
   2/4: Add the Exynos M1 scheduling model.
   3/4: Add the infrastructure for TARGET_CASE_VALUES_THRESHOLD.
   4/4: Add the extra tuning heuristics.

Will do.

Your support is missing a critical hunk for AArch64, there should be an

   (include "../arm/exynos-m1.md")

in aarch64.md to get this working.

This is a fairly large pipeline description (add (automata_option "stats")
to the .md file):

  Automaton `exynos_m1'
     62320 NDFA states,          489094 NDFA arcs

>From experience, you get little benefit from such a complex model, but you
do slow bootstrap times. It isn't for me to say where the model can be
trimmed (I don't have access to documentation for the Exynos-M1), but
you may find it useful to split out the SIMD/FP automaton, and look at whether
your modelling of long latency instructions is entirely neccesary. Have a
look at the Cortex-A57 and Cortex-A53 for some examples of what I mean.

For comparison, here are the stats for Cortex-A53 and Cortex-A57:

  Automaton `cortex_a53'
     281 NDFA states,           1158 NDFA arcs
  Automaton `cortex_a53_advsimd'
     9072 NDFA states,          49572 NDFA arcs
  Automaton `cortex_a57'
     764 NDFA states,           3600 NDFA arcs
  Automaton `cortex_a57_cx'
     204 NDFA states,            864 NDFA arcs

Splitting the automaton in two, one for generic and the other for FP,brought the DFA down to 1520 states and 10170 arcs and 50 states and 174arcs. These figures seem reasonable to me, but I'll have to test thissplit further.

@@ -7672,6 +7737,22 @@ aarch64_override_options_internal (struct gcc_options 
*opts)
                         opts->x_param_values,
                         global_options_set.x_param_values);

+ /* Adjust the heuristics for Exynos M1. */

+  if (selected_cpu->sched_core == exynosm1)

I think it would be preferable to pull these tuning parameters in to
the target structures somehow, rather than guarding them off by specific
CPUs.

+    {
+      /* Increase the maximum peeling limit.  */
+      maybe_set_param_value (PARAM_MAX_COMPLETELY_PEELED_INSNS,
+                             400,
+                             opts->x_param_values,
+                            global_options_set.x_param_values);
+

Actually, I observed that increasing the peeling limit is alsobeneficial for A57. I didn't try A53, but I guess that it wouldn't hurtit. Given than this also benefits ThunderX, I wonder if this overridecould be applied always for AArch64.

+      /* Set the L1 cache line size.  */
+      maybe_set_param_value (PARAM_L1_CACHE_LINE_SIZE,
+                             64,
+                             opts->x_param_values,
+                            global_options_set.x_param_values);
+    }
+
    aarch64_override_options_after_change_1 (opts);
  }

Is adding a new member to tune_param the right place to add the linesize for each CPU?

@@ -13382,6 +13463,20 @@ aarch64_promoted_type (const_tree t)
      return float_type_node;
    return NULL_TREE;
  }
+
+/* Implement TARGET_CASE_VALUES_THRESHOLD.  */
+
+static unsigned int
+aarch64_case_values_threshold (void)
+{
+  /* For Exynos M1, raise the bar for using jump tables.  */
+  if (selected_cpu->sched_core == exynosm1
+      && optimize > 2)
+    return 48;

Likewise, I think this should end up in the per-core tuning structures
rather than masked off by selected_cpu->sched_core == exynosm1.

+  else
+    return default_case_values_threshold ();
+}
+
  #undef TARGET_ADDRESS_COST
  #define TARGET_ADDRESS_COST aarch64_address_cost

Thanks,
James

Thank you,

--
Evandro Menezes

Re: [AArch64] Add scheduling and cost models for Exynos M1

Reply via email to