Hi Richard,

> The patch below is what I meant.  It passes bootstrap & regression-test
> on aarch64-linux-gnu (and so produces the same results for the tests
> that you changed).  Do you see any problems with this version?
> If not, I think we should go with it.

Thanks for the detailed example - unfortunately there are issues with it.
Early expansion means more instructions to deal with in RTL and fewer
optimizations - it even affects inlining (I see more calls/returns in the
instruction frequencies).

Worse, this change completely disables rematerialization of FP immediates
which implies extra spilling. A basic example goes like this:

void g(void);
double bad_remat (double x)
{
  x += 5.347897294;
  g();
  x *= 5.347897294;
  return x;
}

which with -O2 -fomit-frame-pointer -ffixed-d8 -ffixed-d9 -ffixed-d10 
-ffixed-d11 -ffixed-d12 -ffixed-d13 -ffixed-d14 now compiles to:

        adrp    x0, .LC0
        str     x30, [sp, -32]!
        ldr     d31, [x0, #:lo12:.LC0]
        str     d15, [sp, 8]
        fadd    d15, d0, d31
        str     d31, [sp, 24]
        bl      g
        ldr     d31, [sp, 24]
        fmul    d0, d15, d31
        ldr     d15, [sp, 8]
        ldr     x30, [sp], 32
        ret

Recent changes have been moving in the opposite direction - keeping
high-level constructs (like GOT accesses) as a single operation works out
better for register allocation and allows more optimization.

So keeping FP immediates as standard move instructions until regalloc
is best. Supporting MOV/FMOV in regalloc would require another secondary
reload (and would then allow rematerialization of these constants).

Cheers,
Wilco

Reply via email to