On Mon, May 16, 2016 at 11:38:04AM +0100, Wilco Dijkstra wrote: > ping As this change will change code generation for all cores (except Exynos-M1), I'd like to hear from those with more detailed knowledge of ThunderX, X-Gene and qdf24xx before I take this patch.
Let's give it another week or so for comments, and expand the CC list. I wasn't quite convinced by the Cortex-A53 numbers you gave upthread, you said: > > Evandro Menezes wrote: > > > > True, but the results when running on A53 could be quite different. > > GCC is ~1.2% faster on Cortex-A53 built for generic, but there is no > difference in perlbench. Where were these changes if not perlbench? Thanks, James > ________________________________________ > From: Wilco Dijkstra > Sent: 22 April 2016 17:15 > To: gcc-patches@gcc.gnu.org > Cc: nd > Subject: [PATCH][AArch64] Improve aarch64_case_values_threshold setting > > GCC expands switch statements in a very simplistic way and tries to use a > table > expansion even when it is a bad idea for performance or codesize. > GCC typically emits extremely sparse tables that contain mostly default > entries > (something which currently cannot be tuned by backends). Additionally the > computation of the minimum/maximum label offsets is too simplistic so the > tables > are often twice as large as necessary. > > The cost of a table switch is significant due to the setup overhead, the table > lookup (which due to being sparse and large adds unnecessary cachemisses) > and hard to predict indirect jump. Therefore it is best to avoid using a > table > unless there are many real case labels. > > This patch fixes that by setting the default aarch64_case_values_threshold to > 16 when the per-CPU tuning is not set. On SPEC2006 this improves the switch > heavy benchmarks GCC and perlbench both in performance (1-2%) as well as size > (0.5-1% smaller). > > OK for trunk? > > ChangeLog: > 2016-04-22 Wilco Dijkstra <wdijk...@arm.com> > > gcc/ > * config/aarch64/aarch64.c (aarch64_case_values_threshold): > Return a better case_values_threshold when optimizing. > > -- > diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c > index 0620f1e..a240635 100644 > --- a/gcc/config/aarch64/aarch64.c > +++ b/gcc/config/aarch64/aarch64.c > @@ -3546,7 +3546,12 @@ aarch64_cannot_force_const_mem (machine_mode mode > ATTRIBUTE_UNUSED, rtx x) > return aarch64_tls_referenced_p (x); > } > > -/* Implement TARGET_CASE_VALUES_THRESHOLD. */ > +/* Implement TARGET_CASE_VALUES_THRESHOLD. > + The expansion for a table switch is quite expensive due to the number > + of instructions, the table lookup and hard to predict indirect jump. > + When optimizing for speed, with -O3 use the per-core tuning if set, > + otherwise use tables for > 16 cases as a tradeoff between size and > + performance. */ > > static unsigned int > aarch64_case_values_threshold (void) > @@ -3557,7 +3562,7 @@ aarch64_case_values_threshold (void) > && selected_cpu->tune->max_case_values != 0) > return selected_cpu->tune->max_case_values; > else > - return default_case_values_threshold (); > + return optimize_size ? default_case_values_threshold () : 17; > } > >