Hi, There is a performance regression caused by my previous change to aarch64_legitimize_address, in which I forced constant offset out of memory ref if the address expr is in the form of "reg1 + reg2 << scale + const". The intention is to reveal as many loop-invariant opportunities as possible, while depend on GIMPLE optimizers picking up CSE opportunities of "reg << scale" among different memory references.
Though the assumption still holds, gimple optimizers are not powerful enough to pick up CSE opportunities of register scaling expressions at current time. Here comes a workaround: this patch forces register scaling expression out of memory ref, so that RTL CSE pass can handle common register scaling expressions issue, of course, at a cost of possibly missed loop invariants. James and I collected perf data, fortunately this change can improve performance for several cases in various benchmarks, while doesn't cause big regression. It also recovers big regression we observed before for the previous change. I also added comment explaining why the workaround is necessary. I also files PR69653 as an example showing tree optimizer should be improved. Bootstrap and test on AArch64, is it OK? Thanks, bin 2016-02-04 Bin Cheng <bin.ch...@arm.com> * config/aarch64/aarch64.c (aarch64_legitimize_address): Force register scaling out of memory reference and comment why.
diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c index 03bc1b9..24a7d5b 100644 --- a/gcc/config/aarch64/aarch64.c +++ b/gcc/config/aarch64/aarch64.c @@ -4926,13 +4926,18 @@ aarch64_legitimize_address (rtx x, rtx /* orig_x */, machine_mode mode) Rt = Ra + Rc; addr = Rt + Rb<<SCALE. - Here we split CONST out of memory referece because: + TODO: We really should split CONST out of memory referece + because: a) We depend on GIMPLE optimizers to pick up common sub expression involving the scaling operation. b) The index Rb is likely a loop iv, it's better to split the CONST so that computation of new base Rt is a loop invariant and can be moved out of loop. This is more - important when the original base Ra is sfp related. */ + important when the original base Ra is sfp related. + + Unfortunately, GIMPLE optimizers (e.g., SLSR) can not handle + this kind of CSE opportunity at the time of this change, we + have to force register scaling expr out of memory ref now. */ else if (REG_P (op0) || REG_P (op1)) { machine_mode addr_mode = GET_MODE (x); @@ -4942,14 +4947,13 @@ aarch64_legitimize_address (rtx x, rtx /* orig_x */, machine_mode mode) if (REG_P (op1)) std::swap (op0, op1); - rtx addr = gen_rtx_PLUS (addr_mode, op1, base); + rtx addr = plus_constant (addr_mode, base, offset); if (aarch64_legitimate_address_hook_p (mode, addr, false)) { - base = force_operand (plus_constant (addr_mode, - op0, offset), + base = force_operand (gen_rtx_PLUS (addr_mode, op1, op0), NULL_RTX); - return gen_rtx_PLUS (addr_mode, op1, base); + return plus_constant (addr_mode, base, offset); } } }