https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84114

--- Comment #3 from Wilco <wdijkstr at arm dot com> ---
(In reply to Richard Biener from comment #1)
> This is probably related to targetm.sched.reassociation_width where reassoc
> will widen a PLUS chain so several instructions will be executable in
> parallel
> without dependences.  Thus, (x + (y + (z + w))) -> (x + y) + (z + w).  When
> all of them are fed by multiplications this goes from four fmas to two.
> 
> It's basically a target request we honor so it works as designed.
> 
> At some point I thought about integrating FMA detection with reassociation.

It should understand FMA indeed, A*B + p[0] + C*D + p[1] + E*F + p[2] can
become(((p[0] + p[1] + p[2]) + A*B) + C*D) + E*F. 

Also we're missing a reassociation depth parameter. You need to be able to
specify how long a chain needs to be before it is worth splitting - the example
shows a chain of 5 FMAs is not worth splitting since FMA latency on modern
cores is low, but if these were integer operations (not MADD) then the chain
should be split.

Reply via email to