On Mon, May 16, 2016 at 11:38:04AM +0100, Wilco Dijkstra wrote:
> ping

As this change will change code generation for all cores (except
Exynos-M1), I'd like to hear from those with more detailed knowledge of
ThunderX, X-Gene and qdf24xx before I take this patch.

Let's give it another week or so for comments, and expand the CC list.

I wasn't quite convinced by the Cortex-A53 numbers you gave upthread,
you said:

> >  Evandro Menezes wrote:
> >
> > True, but the results when running on A53 could be quite different.
>
> GCC is ~1.2% faster on Cortex-A53 built for generic, but there is no
> difference in perlbench.

Where were these changes if not perlbench?

Thanks,
James

> ________________________________________
> From: Wilco Dijkstra
> Sent: 22 April 2016 17:15
> To: gcc-patches@gcc.gnu.org
> Cc: nd
> Subject: [PATCH][AArch64] Improve aarch64_case_values_threshold setting
> 
> GCC expands switch statements in a very simplistic way and tries to use a 
> table
> expansion even when it is a bad idea for performance or codesize.
> GCC typically emits extremely sparse tables that contain mostly default 
> entries
> (something which currently cannot be tuned by backends).  Additionally the
> computation of the minimum/maximum label offsets is too simplistic so the 
> tables
> are often twice as large as necessary.
> 
> The cost of a table switch is significant due to the setup overhead, the table
> lookup (which due to being sparse and large adds unnecessary cachemisses)
> and hard to predict indirect jump.  Therefore it is best to avoid using a 
> table
> unless there are many real case labels.
> 
> This patch fixes that by setting the default aarch64_case_values_threshold to
> 16 when the per-CPU tuning is not set.  On SPEC2006 this improves the switch
> heavy benchmarks GCC and perlbench both in performance (1-2%) as well as size
> (0.5-1% smaller).
> 
> OK for trunk?
> 
> ChangeLog:
> 2016-04-22  Wilco Dijkstra  <wdijk...@arm.com>
> 
>     gcc/
>         * config/aarch64/aarch64.c (aarch64_case_values_threshold):
>         Return a better case_values_threshold when optimizing.
> 
> --
> diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
> index 0620f1e..a240635 100644
> --- a/gcc/config/aarch64/aarch64.c
> +++ b/gcc/config/aarch64/aarch64.c
> @@ -3546,7 +3546,12 @@ aarch64_cannot_force_const_mem (machine_mode mode 
> ATTRIBUTE_UNUSED, rtx x)
>    return aarch64_tls_referenced_p (x);
>  }
> 
> -/* Implement TARGET_CASE_VALUES_THRESHOLD.  */
> +/* Implement TARGET_CASE_VALUES_THRESHOLD.
> +   The expansion for a table switch is quite expensive due to the number
> +   of instructions, the table lookup and hard to predict indirect jump.
> +   When optimizing for speed, with -O3 use the per-core tuning if set,
> +   otherwise use tables for > 16 cases as a tradeoff between size and
> +   performance.  */
> 
>  static unsigned int
>  aarch64_case_values_threshold (void)
> @@ -3557,7 +3562,7 @@ aarch64_case_values_threshold (void)
>        && selected_cpu->tune->max_case_values != 0)
>      return selected_cpu->tune->max_case_values;
>    else
> -    return default_case_values_threshold ();
> +    return optimize_size ? default_case_values_threshold () : 17;
>  }
> 
> 

Reply via email to