https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84114
--- Comment #3 from Wilco <wdijkstr at arm dot com> --- (In reply to Richard Biener from comment #1) > This is probably related to targetm.sched.reassociation_width where reassoc > will widen a PLUS chain so several instructions will be executable in > parallel > without dependences. Thus, (x + (y + (z + w))) -> (x + y) + (z + w). When > all of them are fed by multiplications this goes from four fmas to two. > > It's basically a target request we honor so it works as designed. > > At some point I thought about integrating FMA detection with reassociation. It should understand FMA indeed, A*B + p[0] + C*D + p[1] + E*F + p[2] can become(((p[0] + p[1] + p[2]) + A*B) + C*D) + E*F. Also we're missing a reassociation depth parameter. You need to be able to specify how long a chain needs to be before it is worth splitting - the example shows a chain of 5 FMAs is not worth splitting since FMA latency on modern cores is low, but if these were integer operations (not MADD) then the chain should be split.