On Tue, Oct 27, 2015 at 06:12:48PM -0500, Evandro Menezes wrote:
> This patch adds the scheduling and cost models for Exynos M1.
> 
> Though it?s a rather large patch, much of it is the DFA model for the
> pipeline.? Still, I?d appreciate any feedback.
> 
> Please, commit if it?s alright.

Hi Evandro,

Thanks for the patch, I have some comments.

To ease review, could I ask you to turn this in to a patch series? Roughly
structured as so:

  1/4: Add the Exynos-M1 cost models.
  2/4: Add the Exynos M1 scheduling model.
  3/4: Add the infrastructure for TARGET_CASE_VALUES_THRESHOLD.
  4/4: Add the extra tuning heuristics.

Your support is missing a critical hunk for AArch64, there should be an

  (include "../arm/exynos-m1.md")

in aarch64.md to get this working.

This is a fairly large pipeline description (add (automata_option "stats")
to the .md file):

 Automaton `exynos_m1' 
    62320 NDFA states,          489094 NDFA arcs

>From experience, you get little benefit from such a complex model, but you
do slow bootstrap times. It isn't for me to say where the model can be
trimmed (I don't have access to documentation for the Exynos-M1), but
you may find it useful to split out the SIMD/FP automaton, and look at whether
your modelling of long latency instructions is entirely neccesary. Have a
look at the Cortex-A57 and Cortex-A53 for some examples of what I mean.

For comparison, here are the stats for Cortex-A53 and Cortex-A57:

 Automaton `cortex_a53'
    281 NDFA states,           1158 NDFA arcs
 Automaton `cortex_a53_advsimd'
    9072 NDFA states,          49572 NDFA arcs
 Automaton `cortex_a57'
    764 NDFA states,           3600 NDFA arcs 
 Automaton `cortex_a57_cx'
    204 NDFA states,            864 NDFA arcs 

> @@ -7672,6 +7737,22 @@ aarch64_override_options_internal (struct gcc_options 
> *opts)
>                        opts->x_param_values,
>                        global_options_set.x_param_values);
>  
> +  /* Adjust the heuristics for Exynos M1.  */
> +  if (selected_cpu->sched_core == exynosm1)

I think it would be preferable to pull these tuning parameters in to
the target structures somehow, rather than guarding them off by specific
CPUs.

> +    {
> +      /* Increase the maximum peeling limit.  */
> +      maybe_set_param_value (PARAM_MAX_COMPLETELY_PEELED_INSNS,
> +                             400,
> +                             opts->x_param_values,
> +                          global_options_set.x_param_values);
> +
> +      /* Set the L1 cache line size.  */
> +      maybe_set_param_value (PARAM_L1_CACHE_LINE_SIZE,
> +                             64,
> +                             opts->x_param_values,
> +                          global_options_set.x_param_values);
> +    }
> +
>    aarch64_override_options_after_change_1 (opts);
>  }
>  


> @@ -13382,6 +13463,20 @@ aarch64_promoted_type (const_tree t)
>      return float_type_node;
>    return NULL_TREE;
>  }
> +
> +/* Implement TARGET_CASE_VALUES_THRESHOLD.  */
> +
> +static unsigned int
> +aarch64_case_values_threshold (void)
> +{
> +  /* For Exynos M1, raise the bar for using jump tables.  */
> +  if (selected_cpu->sched_core == exynosm1
> +      && optimize > 2)
> +    return 48;

Likewise, I think this should end up in the per-core tuning structures
rather than masked off by selected_cpu->sched_core == exynosm1.

> +  else
> +    return default_case_values_threshold ();
> +}
> +
>  #undef TARGET_ADDRESS_COST
>  #define TARGET_ADDRESS_COST aarch64_address_cost
>  

Thanks,
James

Reply via email to