On Mon, Feb 26, 2018 at 11:25 PM, James Greenhalgh <james.greenha...@arm.com> wrote: > On Thu, Feb 22, 2018 at 11:38:03AM +0000, Wilco Dijkstra wrote: >> As discussed in the PR, the reassociation phase runs before FMAs are formed >> and so can significantly reduce FMA opportunities. Although reassociation >> could be switched off, it helps in many cases, so a better alternative is to >> only avoid reassociation of floating point additions. This fixes the >> testcase >> and gives 1% speedup on SPECFP2017, fixing the performance regression. >> >> OK for commit? > > This is OK as a fairly safe fix for stage 4. We should fix reassociation > properly in GCC 9.
It happens that on some targets doing two FMAs in parallel and one non-FMA operation merging them is faster than chaining three FMAs... But yes, somewhere I suggested that FMA detection should/could be integrated with reassociation. Richard. > Thanks, > James > >> >> ChangeLog: >> 2018-02-23 Wilco Dijkstra <wdijk...@arm.com> >> >> PR tree-optimization/84114 >> * config/aarch64/aarch64.c (aarch64_reassociation_width) >> Avoid reassociation of FLOAT_MODE addition. >> -- >> >> diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c >> index >> b3d5fde171920e5759046a4bd61cfcf9eb78d7dd..5f9541cf700aaf18c1f1ac73054614e2932781e4 >> 100644 >> --- a/gcc/config/aarch64/aarch64.c >> +++ b/gcc/config/aarch64/aarch64.c >> @@ -1109,15 +1109,16 @@ aarch64_min_divisions_for_recip_mul (machine_mode >> mode) >> return aarch64_tune_params.min_div_recip_mul_df; >> } >> >> +/* Return the reassociation width of treeop OPC with mode MODE. */ >> static int >> -aarch64_reassociation_width (unsigned opc ATTRIBUTE_UNUSED, >> - machine_mode mode) >> +aarch64_reassociation_width (unsigned opc, machine_mode mode) >> { >> if (VECTOR_MODE_P (mode)) >> return aarch64_tune_params.vec_reassoc_width; >> if (INTEGRAL_MODE_P (mode)) >> return aarch64_tune_params.int_reassoc_width; >> - if (FLOAT_MODE_P (mode)) >> + /* Avoid reassociating floating point addition so we emit more FMAs. */ >> + if (FLOAT_MODE_P (mode) && opc != PLUS_EXPR) >> return aarch64_tune_params.fp_reassoc_width; >> return 1; >> }