Wilco Dijkstra <wilco.dijks...@arm.com> writes: > Enable the most basic form of compare-branch fusion since various CPUs > support it. This has no measurable effect on cores which don't support > branch fusion, but increases fusion opportunities on cores which do.
If you're able to say for the record which cores you tested, then that'd be good. > Bootstrapped on AArch64, OK for commit? > > ChangeLog: > 2019-12-24 Wilco Dijkstra <wdijk...@arm.com> > > * config/aarch64/aarch64.c (generic_tunings): Add branch fusion. > (neoversen1_tunings): Likewise. OK, thanks. I agree there doesn't seem to be an obvious reason why this would pessimise any cores significantly. And it looked from a quick check like all AArch64 cores give these compares the lowest in-use latency (as expected). We can revisit this if anyone finds any counterexamples. Richard > > -- > diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c > index > a3b18b381e1748f8fe5e522bdec4f7c850821fe8..1c32a3543bec4031cc9b641973101829c77296b5 > 100644 > --- a/gcc/config/aarch64/aarch64.c > +++ b/gcc/config/aarch64/aarch64.c > @@ -726,7 +726,7 @@ static const struct tune_params generic_tunings = > SVE_NOT_IMPLEMENTED, /* sve_width */ > 4, /* memmov_cost */ > 2, /* issue_rate */ > - (AARCH64_FUSE_AES_AESMC), /* fusible_ops */ > + (AARCH64_FUSE_AES_AESMC | AARCH64_FUSE_CMP_BRANCH), /* fusible_ops */ > "16:12",/* function_align. */ > "4",/* jump_align. */ > "8",/* loop_align. */ > @@ -1130,7 +1130,7 @@ static const struct tune_params neoversen1_tunings = > SVE_NOT_IMPLEMENTED, /* sve_width */ > 4, /* memmov_cost */ > 3, /* issue_rate */ > - AARCH64_FUSE_AES_AESMC, /* fusible_ops */ > + (AARCH64_FUSE_AES_AESMC | AARCH64_FUSE_CMP_BRANCH), /* fusible_ops */ > "32:16",/* function_align. */ > "32:16",/* jump_align. */ > "32:16",/* loop_align. */