On 12/06/17 11:50, Wilco Dijkstra wrote: > Currently the FP reassociation width is set to 4 on AArch64. On recent > GCCs this has become more aggressive in splitting expressions. This means > many FMAs are split into FMUL and FADD. The reassociation increases register > pressure, in some benchmarks so much that inner loops start to spill. > This results in larger, slower code. Benchmarking FP reassociation width=1 > showed a ~0.5% gain on SPECFP2006 and similar gains on other benchmarks, > so change it to 1. >
Why 1 and not 2? Many processors have 2 fp pipes and forcing this down to a sequential stream is not obviously the right thing. If reassociation is is causing excess spilling, then the right fix for that is to look at the pressure model, not hammer the problem away. R. > Passes regress & bootstrap, OK for commit? > > ChangeLog: > 2017-06-12 Wilco Dijkstra <wdijk...@arm.com> > > * gcc/config/aarch64/aarch64.c > (generic_tuning): Set fp_reassoc_width to 1. > (cortexa35_tunings): Likewise. > (cortexa53_tunings): Likewise. > (cortexa57_tunings): Likewise. > (cortexa72_tunings): Likewise. > (cortexa73_tunings): Likewise. > -- > > diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c > index > 72a758642743025ac8974c8f7ad4c44c31a474d5..0998bf37b2abf399277d2f2a539295506085209f > 100644 > --- a/gcc/config/aarch64/aarch64.c > +++ b/gcc/config/aarch64/aarch64.c > @@ -541,7 +541,7 @@ static const struct tune_params generic_tunings = > 4, /* jump_align. */ > 8, /* loop_align. */ > 2, /* int_reassoc_width. */ > - 4, /* fp_reassoc_width. */ > + 1, /* fp_reassoc_width. */ > 1, /* vec_reassoc_width. */ > 2, /* min_div_recip_mul_sf. */ > 2, /* min_div_recip_mul_df. */ > @@ -567,7 +567,7 @@ static const struct tune_params cortexa35_tunings = > 8, /* jump_align. */ > 8, /* loop_align. */ > 2, /* int_reassoc_width. */ > - 4, /* fp_reassoc_width. */ > + 1, /* fp_reassoc_width. */ > 1, /* vec_reassoc_width. */ > 2, /* min_div_recip_mul_sf. */ > 2, /* min_div_recip_mul_df. */ > @@ -593,7 +593,7 @@ static const struct tune_params cortexa53_tunings = > 8, /* jump_align. */ > 8, /* loop_align. */ > 2, /* int_reassoc_width. */ > - 4, /* fp_reassoc_width. */ > + 1, /* fp_reassoc_width. */ > 1, /* vec_reassoc_width. */ > 2, /* min_div_recip_mul_sf. */ > 2, /* min_div_recip_mul_df. */ > @@ -619,7 +619,7 @@ static const struct tune_params cortexa57_tunings = > 8, /* jump_align. */ > 8, /* loop_align. */ > 2, /* int_reassoc_width. */ > - 4, /* fp_reassoc_width. */ > + 1, /* fp_reassoc_width. */ > 1, /* vec_reassoc_width. */ > 2, /* min_div_recip_mul_sf. */ > 2, /* min_div_recip_mul_df. */ > @@ -645,7 +645,7 @@ static const struct tune_params cortexa72_tunings = > 8, /* jump_align. */ > 8, /* loop_align. */ > 2, /* int_reassoc_width. */ > - 4, /* fp_reassoc_width. */ > + 1, /* fp_reassoc_width. */ > 1, /* vec_reassoc_width. */ > 2, /* min_div_recip_mul_sf. */ > 2, /* min_div_recip_mul_df. */ > @@ -671,7 +671,7 @@ static const struct tune_params cortexa73_tunings = > 8, /* jump_align. */ > 8, /* loop_align. */ > 2, /* int_reassoc_width. */ > - 4, /* fp_reassoc_width. */ > + 1, /* fp_reassoc_width. */ > 1, /* vec_reassoc_width. */ > 2, /* min_div_recip_mul_sf. */ > 2, /* min_div_recip_mul_df. */ >