On Tue, Jun 13, 2017 at 10:43:05AM +0100, Wilco Dijkstra wrote:
> Richard Earnshaw (lists) wrote:
> >
> > Why 1 and not 2? Many processors have 2 fp pipes and forcing this down
> > to a sequential stream is not obviously the right thing.
>
> 1 was faster than 2. Like I said, the reassociation is
Richard Earnshaw (lists) wrote:
>
> Why 1 and not 2? Many processors have 2 fp pipes and forcing this down
> to a sequential stream is not obviously the right thing.
1 was faster than 2. Like I said, the reassociation is too aggressive and even
splits multiply-add rather than keeping them. Until
On 12/06/17 11:50, Wilco Dijkstra wrote:
> Currently the FP reassociation width is set to 4 on AArch64. On recent
> GCCs this has become more aggressive in splitting expressions. This means
> many FMAs are split into FMUL and FADD. The reassociation increases register
> pressure, in some benchma
Currently the FP reassociation width is set to 4 on AArch64. On recent
GCCs this has become more aggressive in splitting expressions. This means
many FMAs are split into FMUL and FADD. The reassociation increases register
pressure, in some benchmarks so much that inner loops start to spill.
This