https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107413
Wilco <wilco at gcc dot gnu.org> changed: What |Removed |Added ---------------------------------------------------------------------------- Ever confirmed|0 |1 Last reconfirmed| |2022-11-04 Status|UNCONFIRMED |ASSIGNED Assignee|unassigned at gcc dot gnu.org |wilco at gcc dot gnu.org --- Comment #10 from Wilco <wilco at gcc dot gnu.org> --- (In reply to Rama Malladi from comment #9) > (In reply to Rama Malladi from comment #8) > > (In reply to Wilco from comment #7) > > > The revert results in about 0.5% loss on Neoverse N1, so it looks like the > > > reassociation pass is still splitting FMAs into separate MUL and ADD > > > (which > > > is bad for narrow cores). > > > > Thank you for checking on N1. Did you happen to check on V1 too to reproduce > > the perf results I had? Any other experiments/ tests I can do to help on > > this filing? Thanks again for the debug/ fix. > > I ran SPEC cpu2017 fprate 1-copy benchmark built with the patch reverted and > using option 'neoverse-n1' on the Graviton 3 processor (which has support > for SVE). The performance was up by 0.4%, primary contributor being > 519.lbm_r which was up 13%. I'm seeing about 1.5% gain on Neoverse V1 and 0.5% loss on Neoverse N1. I'll post a patch that allows per-CPU settings for FMA reassociation, so you'll get good performance with -mcpu=native. However reassociation really needs to be taught about the existence of FMAs.