Re: [AArch64] Add scheduling and cost models for Exynos M1

Evandro Menezes Wed, 28 Oct 2015 14:58:04 -0700

Andrew,

I need to do more investigation WRT prefetching, especially Kyrill'srecent patch.

The change to the peeling limit also benefits A57. If James or Marcuscould confirm this, I'd be glad to add it to all AArch64 targets,assuming that Xgene would be fine with it too.


Thank you,

--
Evandro Menezes

On 10/28/2015 05:40 AM, Andrew Pinski wrote:

On Wed, Oct 28, 2015 at 6:36 PM, James Greenhalgh
<james.greenha...@arm.com> wrote:

On Tue, Oct 27, 2015 at 06:12:48PM -0500, Evandro Menezes wrote:

This patch adds the scheduling and cost models for Exynos M1.

Though it?s a rather large patch, much of it is the DFA model for the
pipeline.? Still, I?d appreciate any feedback.

Please, commit if it?s alright.

Hi Evandro,

Thanks for the patch, I have some comments.

To ease review, could I ask you to turn this in to a patch series? Roughly
structured as so:

   1/4: Add the Exynos-M1 cost models.
   2/4: Add the Exynos M1 scheduling model.
   3/4: Add the infrastructure for TARGET_CASE_VALUES_THRESHOLD.
   4/4: Add the extra tuning heuristics.

Your support is missing a critical hunk for AArch64, there should be an

   (include "../arm/exynos-m1.md")

in aarch64.md to get this working.

This is a fairly large pipeline description (add (automata_option "stats")
to the .md file):

  Automaton `exynos_m1'
     62320 NDFA states,          489094 NDFA arcs

 From experience, you get little benefit from such a complex model, but you
do slow bootstrap times. It isn't for me to say where the model can be
trimmed (I don't have access to documentation for the Exynos-M1), but
you may find it useful to split out the SIMD/FP automaton, and look at whether
your modelling of long latency instructions is entirely neccesary. Have a
look at the Cortex-A57 and Cortex-A53 for some examples of what I mean.

For comparison, here are the stats for Cortex-A53 and Cortex-A57:

  Automaton `cortex_a53'
     281 NDFA states,           1158 NDFA arcs
  Automaton `cortex_a53_advsimd'
     9072 NDFA states,          49572 NDFA arcs
  Automaton `cortex_a57'
     764 NDFA states,           3600 NDFA arcs
  Automaton `cortex_a57_cx'
     204 NDFA states,            864 NDFA arcs

@@ -7672,6 +7737,22 @@ aarch64_override_options_internal (struct gcc_options 
*opts)
                        opts->x_param_values,
                        global_options_set.x_param_values);

+  /* Adjust the heuristics for Exynos M1.  */
+  if (selected_cpu->sched_core == exynosm1)

I think it would be preferable to pull these tuning parameters in to
the target structures somehow, rather than guarding them off by specific
CPUs.

+    {
+      /* Increase the maximum peeling limit.  */
+      maybe_set_param_value (PARAM_MAX_COMPLETELY_PEELED_INSNS,
+                             400,
+                             opts->x_param_values,
+                          global_options_set.x_param_values);
+
+      /* Set the L1 cache line size.  */
+      maybe_set_param_value (PARAM_L1_CACHE_LINE_SIZE,
+                             64,
+                             opts->x_param_values,
+                          global_options_set.x_param_values);
+    }
+
    aarch64_override_options_after_change_1 (opts);
  }

I have a patch for the L1 cache size (prefetch) infastructure which
sets it via tunning parameters but I have not had time to submit it
yet.
Also Peeling parameter changes helps ThunderX too.

Thanks,
Andrew

@@ -13382,6 +13463,20 @@ aarch64_promoted_type (const_tree t)
      return float_type_node;
    return NULL_TREE;
  }
+
+/* Implement TARGET_CASE_VALUES_THRESHOLD.  */
+
+static unsigned int
+aarch64_case_values_threshold (void)
+{
+  /* For Exynos M1, raise the bar for using jump tables.  */
+  if (selected_cpu->sched_core == exynosm1
+      && optimize > 2)
+    return 48;

Likewise, I think this should end up in the per-core tuning structures
rather than masked off by selected_cpu->sched_core == exynosm1.

+  else
+    return default_case_values_threshold ();
+}
+
  #undef TARGET_ADDRESS_COST
  #define TARGET_ADDRESS_COST aarch64_address_cost

Thanks,
James

Re: [AArch64] Add scheduling and cost models for Exynos M1

Reply via email to